[CRIU] Checkpoint and restore application has established unix domain socket connections.
Jun Gan
geminijun at gmail.com
Mon Oct 19 09:55:15 MSK 2020
Hi Nicolas,
Thanks a lot for sharing your solution, I just tried it and it gave me this
error when restoring it:
(00.165775) 773832: unix: Opening standalone (stage 0 id 0x10 ino 7105412
peer 7105411)
(00.165791) 773832: unix: bind id 0x10 ino 7105412 addr /tmp/redis.sock
(00.165798) 773832: Error (criu/sk-unix.c:1637): unix: Can't bind id 0x10
ino 7105412 addr /tmp/redis.sock: Address already in use
(00.165804) 773832: Error (criu/files.c:1211): Unable to open fd=8 id=0x10
(00.166031) Error (criu/cr-restore.c:1565): 773832 exited, status=1
(00.166044) Error (criu/cr-restore.c:2488): Restoring FAILED.
It seems CRIU would think the socket act as a server in this case. I also
tried to avoid bind when the state is TCP_CLOSE. But it gives me segfault
since it still tries to connect back to the previous peer.
(00.133302) 773853: Create fd for 6
(00.133305) 773853: unix: Opening standalone (stage 0 id 0xf ino 7113121
peer 0)
(00.133329) 773853: unix: bind id 0xf ino 7113121 addr /tmp/redis.sock
(00.133420) 773853: unix: Putting 7113121 into listen state
(00.133425) 773853: sockets: 7 restore sndbuf 212992 rcv buf 212992
(00.133428) 773853: sockets: restore priority 0 for socket
(00.133430) 773853: sockets: restore rcvlowat 1 for socket
(00.133432) 773853: sockets: restore mark 0 for socket
(00.133436) 773853: Create fd for 7
(00.133438) 773853: unix: Opening standalone (stage 0 id 0x10 ino 7113153
peer 7113152)
(00.133451) 773853: Create fd for 8
(00.133455) 773853: unix: Opening standalone (stage 1 id 0x10 ino 7113153
peer 7113152)
(00.133457) 773853: unix: Connect 7113153 to 7113152
(02.630685) Error: 773853 killed by signal 11: Segmentation fault
(02.630709) Error: Restoring FAILED.
Where did you handle the TCP_CLOSE state when restoring it? And I don't
quite understand the fle->stage here.
Thanks,
Jun Gan
On Tue, Sep 8, 2020 at 1:42 PM Nicolas Viennot <Nicolas.Viennot at twosigma.com>
wrote:
> I think we had the same need. Here’s the commit that solves this problem:
> https://github.com/twosigma/criu/commit/e562e97f29c98d155b02a871493533b24ecf2abb
>
> Nico
>
> ---
>
> From: criu-bounces at openvz.org <criu-bounces at openvz.org> On Behalf Of Jun
> Gan
> Sent: Sunday, September 6, 2020 7:51 PM
> To: criu at openvz.org
> Subject: [CRIU] Checkpoint and restore application has established unix
> domain socket connections.
>
> Hi,
>
> I'm using CRIU to checkpoint and restore redis, which has a redis-cli
> connected to it via a unix domain socket. I cannot just use --ext-unix-sk
> as it sets socket type to STREAM. Here is the command I used for dump:
>
> criu dump -t 23253 -D criu_imgs/ -o dump.log --shell-job -v4 --external
> unix[2940132]
>
> And here is the command I used for restore:
>
> criu restore -D criu_imgs/ -o restore_.log --shell-job -v4 --ext-unix-sk
> --tcp-close
>
> Then I found I got segv in the restore log:
>
> (00.014810) 23253: unix: Opening standalone (stage 0 id 0x10 ino 2940132
> peer 2940131)
> (00.014823) 23253: Create fd for 8
> (00.014826) 23253: unix: Opening standalone (stage 1 id 0x10 ino 2940132
> peer 2940131)
> (00.014829) 23253: unix: Connect 2940132 to 2940131
> (00.015683) Error (criu/cr-restore.c:1417): 23253 killed by signal 11:
> Segmentation fault
> (00.015780) Error (criu/cr-restore.c:2293): Restoring FAILED.
>
>
> Checking the code, I found that CRIU will still try to restore this socket
> by connecting it back to the original peer, which may not be available
> anymore. Is there any option to let criu bypass restoring this socket and
> leave it close just like "--tcp-close" ? In my case, I don't need it
> anymore, and can just ask the client to re-connect.
>
> --
> Jun Gan
>
--
Jun Gan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20201018/7bbb44f2/attachment.html>
More information about the CRIU
mailing list