[CRIU] [PATCH] tcp: Try harder to restore recv queue
Andrew Vagin
avagin at virtuozzo.com
Thu Dec 24 13:22:30 PST 2015
On Thu, Dec 24, 2015 at 10:50:46PM +0300, Pavel Emelyanov wrote:
> On restore we try to put data back into recv queue with quite
> big chunks. However the kernel doesn't try hard to split the
> data into packets in repair mode for this queue and just
> allocates the linear skb of the given size. If the size is
> moderately big, the allocation is subject to fail, slab doesn't
> reliably allocates memory over 4k.
>
> So, when failing with big chunk on recv queue -- shrink the
> chunk and try again.
>
Acked-by: Andrew Vagin <avagin at virtuozzo.com>
> Signed-off-by: Pavel Emelyanov <xemul at parallels.com>
> ---
> sk-tcp.c | 25 ++++++++++++++++++++-----
> 1 file changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/sk-tcp.c b/sk-tcp.c
> index 9f3bf0d..b5e66b0 100644
> --- a/sk-tcp.c
> +++ b/sk-tcp.c
> @@ -461,7 +461,7 @@ static int restore_tcp_seqs(int sk, TcpStreamEntry *tse)
>
> static int __send_tcp_queue(int sk, int queue, u32 len, struct cr_img *img)
> {
> - int ret, err = -1;
> + int ret, err = -1, max_chunk;
> int off;
> char *buf;
>
> @@ -472,17 +472,32 @@ static int __send_tcp_queue(int sk, int queue, u32 len, struct cr_img *img)
> if (read_img_buf(img, buf, len) < 0)
> goto err;
>
> + max_chunk = (queue == TCP_RECV_QUEUE ? kdat.tcp_max_rshare : len);
> off = 0;
> while (len) {
> int chunk = len;
>
> - if (queue == TCP_RECV_QUEUE && len > kdat.tcp_max_rshare)
> - chunk = kdat.tcp_max_rshare;
> + if (chunk > max_chunk)
> + chunk = max_chunk;
>
> ret = send(sk, buf + off, chunk, 0);
> if (ret <= 0) {
> - pr_perror("Can't restore %d queue data (%d), want (%d:%d)",
> - queue, ret, chunk, len);
> + if ((queue == TCP_RECV_QUEUE) && (max_chunk > 1024) && (errno == ENOMEM)) {
> + /*
> + * When restoring recv queue in repair mode
> + * kernel doesn't try hard and just allocates
> + * a linear skb with the size we pass to the
> + * system call. Thus, if the size is too big
> + * for slab allocator, the send just fails
> + * with ENOMEM. Try smaller chunk, hopefully
> + * there's still enough memory in the system.
> + */
> + max_chunk >>= 1;
> + continue;
> + }
> +
> + pr_perror("Can't restore %d queue data (%d), want (%d:%d:%d)",
> + queue, ret, chunk, len, max_chunk);
> goto err;
> }
> off += ret;
> --
> 1.9.3
>
> _______________________________________________
> CRIU mailing list
> CRIU at openvz.org
> https://lists.openvz.org/mailman/listinfo/criu
More information about the CRIU
mailing list