[CRIU] problem restoring unix queues?

Tycho Andersen tycho.andersen at canonical.com
Mon Jul 20 08:25:40 PDT 2015


On Mon, Jul 20, 2015 at 05:43:38PM +0300, Pavel Emelyanov wrote:
> On 07/20/2015 05:28 PM, Tycho Andersen wrote:
> > On Mon, Jul 20, 2015 at 02:13:52PM +0300, Pavel Emelyanov wrote:
> >> On 07/18/2015 02:17 AM, Tycho Andersen wrote:
> >>> On Fri, Jul 17, 2015 at 07:13:55PM +0300, Pavel Emelyanov wrote:
> >>>> On 07/17/2015 05:33 PM, Tycho Andersen wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> Sometimes I see something like:
> >>>>>
> >>>>> (00.095976)     77: Error (sk-queue.c:238): Failed to send packet: Resource temporarily unavailable
> >>>>>
> >>>>> when restoring (full log here: http://paste.ubuntu.com/11893011/).
> >>>>>
> >>>>> The sk-queues.img is: http://paste.ubuntu.com/11893015/ so I don't
> >>>>> /think/ it should be filling the buffer, so I'm not sure why we'd get
> >>>>> EAGAIN.
> >>>>>
> >>>>> Thoughts?
> >>>>
> >>>> Is it stream or datagram socket? AFAIK the EAGAIN is only reported when
> >>>> you hit socket buffer limit, but we try to raise one :\
> >>>
> >>> Assuming the "id" is the same as "id_for" above, it looks like it is
> >>> SOCK_DGRAM (output from unixsk.img):
> >>>
> >>>         {
> >>>             "id": 14, 
> >>>             "ino": 16798503, 
> >>>             "type": 2, # SOCK_DGRAM == 2
> >>>             "state": 7, 
> >>>             "flags": "0x80802", 
> >>>             "uflags": "0x0", 
> >>>             "backlog": 0, 
> >>>             "peer": 0, 
> >>>             "fown": {
> >>>                 "uid": 0, 
> >>>                 "euid": 0, 
> >>>                 "signum": 0, 
> >>>                 "pid_type": 0, 
> >>>                 "pid": 0
> >>>             }, 
> >>>             "opts": {
> >>>                 "so_sndbuf": 212992, 
> >>>                 "so_rcvbuf": 16777216, 
> >>>                 "so_snd_tmo_sec": 0, 
> >>>                 "so_snd_tmo_usec": 0, 
> >>>                 "so_rcv_tmo_sec": 0, 
> >>>                 "so_rcv_tmo_usec": 0, 
> >>>                 "reuseaddr": true, 
> >>>                 "so_priority": 0, 
> >>>                 "so_rcvlowat": 1, 
> >>>                 "so_mark": 0, 
> >>>                 "so_passcred": true, 
> >>>                 "so_passsec": true, 
> >>>                 "so_dontroute": false, 
> >>>                 "so_no_check": false
> >>>             }, 
> >>>             "name": "L3J1bi9zeXN0ZW1kL2pvdXJuYWwvc29ja2V0AA==\n", 
> >>>             "file_perms": {
> >>>                 "mode": 49590, 
> >>>                 "uid": 0, 
> >>>                 "gid": 0
> >>>             }
> >>>         }, 
> >>>
> >>> Based on my read of the criu/kernel sources, we're setting
> >>> sk->sk_sndbuf, while the kernel source in unix_dgram_sndmsg is checking
> >>> sk->sk_receive_queue for sending EAGAIN; since we pass the backlog of
> >>> 0 to listen() (viz. above), I guess this is what's causing the
> >>> problem? Seems like a backlog of 0 must be wrong, since the queue is
> >>> of non-zero length.
> >>
> >> Wait a second, if you send packets to socket it cannot be in listen
> >> state. Moreover, according to your image the state of the socket is
> >> 7 which is CLOSED :)
> > 
> > Hrm. So perhaps we should not be sending stuff to this socket at all?
> > Should we add a check for TCP_CLOSED to guard the call to
> > dump_sk_queue?
> 
> We dump only read queue that can be alive for closed sockets. E.g. the
> test zdtm/live/static/socket02.c does exactly that.

I see. So isn't it possible then that the socket client end was
created from a server that did listen(..., 0), and that's what is
causing the problem? The backlog is the only thing that I can see that
can cause EAGAIN from unix_dgram_sendmsg().

For the example above, it looks like the socket that did a listen().
The peer that is failing to send is:

# 0x1005327 == 16798503, the peer above
# 0x1005ffe == 16801790, the peer below

(00.095929)     77:   Connect 0x1005ffe to 0x1005327
(00.095937)     77: Trying to restore recv queue for 14
(00.095949)     77:   Restoring 357-bytes skb for 14
(00.095976)     77: Error (sk-queue.c:238): Failed to send packet: Resource temporarily unavailable

        {
            "id": 44, 
            "ino": 16801790, 
            "type": 2, 
            "state": 7, 
            "flags": "0x80002", 
            "uflags": "0x0", 
            "backlog": 0, 
            "peer": 16798503, 
            "fown": {
                "uid": 0, 
                "euid": 0, 
                "signum": 0, 
                "pid_type": 0, 
                "pid": 0
            }, 
            "opts": {
                "so_sndbuf": 16777216, 
                "so_rcvbuf": 212992, 
                "so_snd_tmo_sec": 10, 
                "so_snd_tmo_usec": 0, 
                "so_rcv_tmo_sec": 0, 
                "so_rcv_tmo_usec": 0, 
                "reuseaddr": false, 
                "so_priority": 0, 
                "so_rcvlowat": 1, 
                "so_mark": 0, 
                "so_passcred": false, 
                "so_passsec": false, 
                "so_dontroute": false, 
                "so_no_check": false
            }, 
            "name": ""
        }

Both ends of this have a backlog of 0 (although only the "server" side
matters). So the socket id 14 does a listen(..., 0), socket id 44 does
a connect() to it, and then tries to send stuff to the queue and fails
with EAGAIN? (Looks like both ends of this are closed which is also
strange...)

Tycho


More information about the CRIU mailing list