[CRIU] problem restoring unix queues?
Pavel Emelyanov
xemul at parallels.com
Mon Jul 20 10:08:35 PDT 2015
On 07/20/2015 06:25 PM, Tycho Andersen wrote:
> On Mon, Jul 20, 2015 at 05:43:38PM +0300, Pavel Emelyanov wrote:
>> On 07/20/2015 05:28 PM, Tycho Andersen wrote:
>>> On Mon, Jul 20, 2015 at 02:13:52PM +0300, Pavel Emelyanov wrote:
>>>> On 07/18/2015 02:17 AM, Tycho Andersen wrote:
>>>>> On Fri, Jul 17, 2015 at 07:13:55PM +0300, Pavel Emelyanov wrote:
>>>>>> On 07/17/2015 05:33 PM, Tycho Andersen wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Sometimes I see something like:
>>>>>>>
>>>>>>> (00.095976) 77: Error (sk-queue.c:238): Failed to send packet: Resource temporarily unavailable
>>>>>>>
>>>>>>> when restoring (full log here: http://paste.ubuntu.com/11893011/).
>>>>>>>
>>>>>>> The sk-queues.img is: http://paste.ubuntu.com/11893015/ so I don't
>>>>>>> /think/ it should be filling the buffer, so I'm not sure why we'd get
>>>>>>> EAGAIN.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>
>>>>>> Is it stream or datagram socket? AFAIK the EAGAIN is only reported when
>>>>>> you hit socket buffer limit, but we try to raise one :\
>>>>>
>>>>> Assuming the "id" is the same as "id_for" above, it looks like it is
>>>>> SOCK_DGRAM (output from unixsk.img):
>>>>>
>>>>> {
>>>>> "id": 14,
>>>>> "ino": 16798503,
>>>>> "type": 2, # SOCK_DGRAM == 2
>>>>> "state": 7,
>>>>> "flags": "0x80802",
>>>>> "uflags": "0x0",
>>>>> "backlog": 0,
>>>>> "peer": 0,
>>>>> "fown": {
>>>>> "uid": 0,
>>>>> "euid": 0,
>>>>> "signum": 0,
>>>>> "pid_type": 0,
>>>>> "pid": 0
>>>>> },
>>>>> "opts": {
>>>>> "so_sndbuf": 212992,
>>>>> "so_rcvbuf": 16777216,
>>>>> "so_snd_tmo_sec": 0,
>>>>> "so_snd_tmo_usec": 0,
>>>>> "so_rcv_tmo_sec": 0,
>>>>> "so_rcv_tmo_usec": 0,
>>>>> "reuseaddr": true,
>>>>> "so_priority": 0,
>>>>> "so_rcvlowat": 1,
>>>>> "so_mark": 0,
>>>>> "so_passcred": true,
>>>>> "so_passsec": true,
>>>>> "so_dontroute": false,
>>>>> "so_no_check": false
>>>>> },
>>>>> "name": "L3J1bi9zeXN0ZW1kL2pvdXJuYWwvc29ja2V0AA==\n",
>>>>> "file_perms": {
>>>>> "mode": 49590,
>>>>> "uid": 0,
>>>>> "gid": 0
>>>>> }
>>>>> },
>>>>>
>>>>> Based on my read of the criu/kernel sources, we're setting
>>>>> sk->sk_sndbuf, while the kernel source in unix_dgram_sndmsg is checking
>>>>> sk->sk_receive_queue for sending EAGAIN; since we pass the backlog of
>>>>> 0 to listen() (viz. above), I guess this is what's causing the
>>>>> problem? Seems like a backlog of 0 must be wrong, since the queue is
>>>>> of non-zero length.
>>>>
>>>> Wait a second, if you send packets to socket it cannot be in listen
>>>> state. Moreover, according to your image the state of the socket is
>>>> 7 which is CLOSED :)
>>>
>>> Hrm. So perhaps we should not be sending stuff to this socket at all?
>>> Should we add a check for TCP_CLOSED to guard the call to
>>> dump_sk_queue?
>>
>> We dump only read queue that can be alive for closed sockets. E.g. the
>> test zdtm/live/static/socket02.c does exactly that.
>
> I see. So isn't it possible then that the socket client end was
> created from a server that did listen(..., 0),
But sockets we're having here are SOCK_DGRAM, aren't they?
> and that's what is
> causing the problem? The backlog is the only thing that I can see that
> can cause EAGAIN from unix_dgram_sendmsg().
>
> For the example above, it looks like the socket that did a listen().
> The peer that is failing to send is:
>
> # 0x1005327 == 16798503, the peer above
> # 0x1005ffe == 16801790, the peer below
>
> (00.095929) 77: Connect 0x1005ffe to 0x1005327
> (00.095937) 77: Trying to restore recv queue for 14
> (00.095949) 77: Restoring 357-bytes skb for 14
> (00.095976) 77: Error (sk-queue.c:238): Failed to send packet: Resource temporarily unavailable
Ah it looks like I finally got what you mean :)
There are two places in unix_dgram_sendmsg that result in EAGAIN.
First is hitting send socket sndbuf, we (seem to) address that
by raising it with sockopt. But there's another one -- check for
unix_recvq_full(other). It checks for the number of packets in
queue doesn't exceed the max_ack_backlog :(
Are you talking about it?
If yes, how has it happened that datagram socket has more data
packets (as we see on dump) than this limit?
> {
> "id": 44,
> "ino": 16801790,
> "type": 2,
> "state": 7,
> "flags": "0x80002",
> "uflags": "0x0",
> "backlog": 0,
> "peer": 16798503,
> "fown": {
> "uid": 0,
> "euid": 0,
> "signum": 0,
> "pid_type": 0,
> "pid": 0
> },
> "opts": {
> "so_sndbuf": 16777216,
> "so_rcvbuf": 212992,
> "so_snd_tmo_sec": 10,
> "so_snd_tmo_usec": 0,
> "so_rcv_tmo_sec": 0,
> "so_rcv_tmo_usec": 0,
> "reuseaddr": false,
> "so_priority": 0,
> "so_rcvlowat": 1,
> "so_mark": 0,
> "so_passcred": false,
> "so_passsec": false,
> "so_dontroute": false,
> "so_no_check": false
> },
> "name": ""
> }
>
> Both ends of this have a backlog of 0 (although only the "server" side
> matters). So the socket id 14 does a listen(..., 0), socket id 44 does
> a connect() to it, and then tries to send stuff to the queue and fails
> with EAGAIN? (Looks like both ends of this are closed which is also
> strange...)
>
> Tycho
> .
>
More information about the CRIU
mailing list