[CRIU] problem in post_open_unix_sk?
Pavel Emelyanov
xemul at parallels.com
Tue Jul 14 05:04:33 PDT 2015
On 07/01/2015 02:38 AM, Tycho Andersen wrote:
> Hi Pavel,
>
> On Wed, Jun 24, 2015 at 12:17:10AM +0300, Pavel Emelyanov wrote:
>> On 06/23/2015 09:25 PM, Tycho Andersen wrote:
>>> Hi all,
>>>
>>> When trying to c/r wiley (i.e. systemd-based) containers, I'm getting
>>> crashes like the following sometimes:
>>>
>>> http://paste.ubuntu.com/11763264/
>>> (00.135271) 396: Error (sk-unix.c:701): Can't connect 0x6d126 socket: Connection refused
>>>
>>> Any ideas as to what this might be?
>>
>> I've found several reasons for ECONNREFUSED
>>
>> 1. Target path/name doesn't exist
>> 2. Target path is not socket
>> 3. Target stream socket is not listen()-ing
>> 4. Target dgram socket is connect()-ed to someone else
>>
>> Can you check the images for what kind of sockets CRIU is trying to
>> interconnect?
>
> http://paste.ubuntu.com/11801928/ is the contents of unixsk.img (the
> log is http://paste.ubuntu.com/11801992/). In particular, it looks
> like this is the culprit:
>
[snip]
>
> I'm not sure how the path is interpreted though; it's not in the root,
> in any case, and the base64 decoding is just an integer. It's peer's
> name is base64 encoded /run/systemd/private, so I guess this is a
> client of that?
Ah, I see :) This is a race. Look at the logs:
(00.135950) 1: Opening standalone socket (id 0xb ino 0x9422f peer 0)
(00.135974) 353: Error (sk-unix.c:701): Can't connect 0x947c4 socket: Connection refused
(00.136390) 1: Error (cr-restore.c:1228): 353 exited, status=1
(00.136407) 1: Putting 0x9422f into listen state
The sockets 0x9422f is in listen state (according to the image) and the
0x947c4 is connected to it. On restore the process 1 opens the first socket,
then 353 tries to connect the second to it and fails since the target socket
is NOT (yet) listening. Putting one into listen happens later -- the last
line in the logs says that.
-- Pavel
More information about the CRIU
mailing list