[CRIU] restoring sockets with data pending

Fri Oct 10 08:04:16 PDT 2014

On Fri, Oct 10, 2014 at 06:55:33PM +0400, Andrew Vagin wrote:
> On Fri, Oct 10, 2014 at 09:25:26AM -0500, Tycho Andersen wrote:
> > On Fri, Oct 10, 2014 at 06:10:19PM +0400, Andrew Vagin wrote:
> > > On Fri, Oct 10, 2014 at 08:55:03AM -0500, Tycho Andersen wrote:
> > > > On Fri, Oct 10, 2014 at 03:39:41PM +0400, Andrew Vagin wrote:
> > > > > On Thu, Oct 09, 2014 at 03:23:20PM -0500, Tycho Andersen wrote:
> > > > > > Hi Pavel, Andrew,
> > > > > > 
> > > > > > On Mon, Sep 15, 2014 at 02:21:18PM +0400, Pavel Emelyanov wrote:
> > > > > > > On 09/13/2014 04:55 AM, Tycho Andersen wrote:
> > > > > > > > Hi all,
> > > > > > > > 
> > > > > > > > One of the errors that I sometimes get when dumping things is:
> > > > > > > > 
> > > > > > > > Error (sk-netlink.c:73): The socket has data to read
> > > > > > > 
> > > > > > > Yup :) These are netlink sockets with some data from the krenel.
> > > > > > > 
> > > > > > > > What is necessary to fix this problem? I guess there is some work to
> > > > > > > > be done on the kernel side to provide an API for getting these
> > > > > > > > buffers? Is there some other trick we can use?
> > > > > > > 
> > > > > > > Strictly speaking, we should patch the kernel to be able to peek
> > > > > > > this data from socket and to put it back. But maybe in some cases
> > > > > > > we could invent some workaround (I'm not quite sure netdev@ people
> > > > > > > would be happy with such hacks :) ).
> > > > > > > 
> > > > > > > Can we investigate what kind of socket is it and what data is in
> > > > > > > there?
> > > > > > 
> > > > > > Finally getting around to looking at this; lsof says:
> > > > > > 
> > > > > > systemd-u 179 root    4u  netlink                         0t0  13252 KOBJECT_UEVENT
> > > > > 
> > > > > http://www.makelinux.net/books/lkd2/ch17lev1sec9
> > > > > 
> > > > > How often do you see data in this socket?
> > > > 
> > > > The failure only happens the first checkpoint of each fresh boot of
> > > > the host. After one such failure, everything seems to work just fine.
> > > > 
> > > > > If "systemd-u" is systemd-udev, you can try to use "udevadm monitor" to
> > > > > find out which events are here.
> > > > 
> > > > Here's the full log, 53s is the container boot, 58s is the criu dump:
> > > > 
> > > > monitor will print the received events for:
> > > > UDEV - the event which udev sends out after rule processing
> > > > KERNEL - the kernel uevent
> > > > 
> > > > KERNEL[53.770082] add      /module/veth (module)
> > > > KERNEL[53.771621] add      /devices/virtual/net/vethUIX1V6 (net)
> > > > UDEV  [53.771644] add      /module/veth (module)
> > > > KERNEL[53.771658] add      /devices/virtual/net/vethUIX1V6/queues/rx-0 (queues)
> > > > KERNEL[53.771669] add      /devices/virtual/net/vethUIX1V6/queues/tx-0 (queues)
> > > > KERNEL[53.771682] add      /devices/virtual/net/veth2XKLIH (net)
> > > > KERNEL[53.771692] add      /devices/virtual/net/veth2XKLIH/queues/rx-0 (queues)
> > > > KERNEL[53.771702] add      /devices/virtual/net/veth2XKLIH/queues/tx-0 (queues)
> > > > KERNEL[53.777260] add      /kernel/slab/nf_conntrack_ffff880074adb000 (slab)
> > > > KERNEL[53.777716] add      /devices/virtual/net/lo/queues/rx-0 (queues)
> > > > KERNEL[53.777736] add      /devices/virtual/net/lo/queues/tx-0 (queues)
> > > > UDEV  [53.777836] add      /kernel/slab/nf_conntrack_ffff880074adb000 (slab)
> > > > UDEV  [53.779071] add      /devices/virtual/net/lo/queues/rx-0 (queues)
> > > > UDEV  [53.779380] add      /devices/virtual/net/lo/queues/tx-0 (queues)
> > > > UDEV  [53.783180] add      /devices/virtual/net/vethUIX1V6 (net)
> > > > UDEV  [53.784406] add      /devices/virtual/net/vethUIX1V6/queues/rx-0 (queues)
> > > > UDEV  [53.784419] add      /devices/virtual/net/vethUIX1V6/queues/tx-0 (queues)
> > > > UDEV  [53.785438] add      /devices/virtual/net/veth2XKLIH (net)
> > > > UDEV  [53.785456] add      /devices/virtual/net/veth2XKLIH/queues/tx-0 (queues)
> > > > UDEV  [53.785464] add      /devices/virtual/net/veth2XKLIH/queues/rx-0 (queues)
> > > > KERNEL[53.808246] remove   /devices/virtual/net/vethUIX1V6 (net)
> > > > UDEV  [53.809029] remove   /devices/virtual/net/vethUIX1V6 (net)
> > > > KERNEL[58.967936] add      /module/unix_diag (module)
> > > > UDEV  [58.969124] add      /module/unix_diag (module)
> > > > KERNEL[58.973698] add      /module/inet_diag (module)
> > > > UDEV  [58.973712] add      /module/inet_diag (module)
> > > > KERNEL[58.975979] add      /module/tcp_diag (module)
> > > > UDEV  [58.976274] add      /module/tcp_diag (module)
> > > > KERNEL[58.978291] add      /module/udp_diag (module)
> > > > UDEV  [58.979284] add      /module/udp_diag (module)
> > > > KERNEL[58.982102] add      /module/af_packet_diag (module)
> > > > UDEV  [58.983034] add      /module/af_packet_diag (module)
> > > > KERNEL[58.985707] add      /module/netlink_diag (module)
> > > > UDEV  [58.986084] add      /module/netlink_diag (module)
> > > 
> > > Here is an answer. We get events about new modules. So if you load
> > > these modules before dumping, you will not have these events and data in
> > > a socket.
> > 
> > Yes, that fixes it for me. What's the right solution here? I guess the
> > real solution is to dump the data in the socket,
> 
> It doesn't work. For example when you request information about all unix
> sockets, the kernel doesn't give you all data immediately. It gives you
> information about a few sockets. And when you read this data, the kernel
> will generate a new portion.
> 
> The size of the result data can be so big, that the kernel will never allow
> you to enqueue this data back.
> 
> And the logic how to generate the next portion of data is different  for
> each type of netlink sockets.

Ah, ok.

> 
> > but barring that
> > should we just force the modules to load before we use them?
> 
> I suggest to load these modules and forget about this problem, until it
> will not be critical.

Yes, the only question is where to load the modules? It seems like
users (of high level tools) shouldn't be forced to load these, so we
can either load them at criu time or before hand in lxc-checkpoint or
something.

Tycho

> > 
> > Tycho
> > 
> > > > 
> > > > Tycho