[CRIU] criu and runc

Andrei Vagin avagin at virtuozzo.com
Thu Dec 8 09:56:34 PST 2016


On Thu, Dec 08, 2016 at 05:04:14PM +0100, Adrian Reber wrote:
> On Wed, Dec 07, 2016 at 09:40:10AM -0800, Andrei Vagin wrote:
> > On Wed, Dec 07, 2016 at 10:19:21AM +0100, Adrian Reber wrote:
> > > On Wed, Dec 07, 2016 at 12:29:43AM -0800, Andrei Vagin wrote:
> > > > On Tue, Dec 06, 2016 at 04:55:12PM +0100, Adrian Reber wrote:
> > > > > I tried to checkpoint and restore a runc container with today's git
> > > > > checkout. It works, but tcp-established is not really working.
> > > > > 
> > > > > I have container with a httpd running inside and I and connect to it
> > > > > using 'telnet rhel0x 80' to keep the connection established.
> > > > > 
> > > > > I then do 'runc checkpoint rhel7-httpd --tcp-established' and 'runc
> > > > > restore -d rhel7-httpd --tcp-established'. Both commands are working.
> > > > 
> > > > Does the container have its own network namespace? What network
> > > > configuration is used for this container?
> > > 
> > > Host network. I am using 'oci-runtime-tool generate --network host' to
> > > generate the config and the namespace configuration looks like this:
> > > 
> > > mespaces": [
> > > 			{
> > > 				"type": "pid"
> > > 			},
> > > 			{
> > > 				"type": "ipc"
> > > 			},
> > > 			{
> > > 				"type": "uts"
> > > 			},
> > > 			{
> > > 				"type": "mount"
> > > 			}
> > > 		]
> > > 
> > > 
> > > > > In my telnet session I now type 'GET /' but I get a TCP reset:
> > > > > 
> > > > > 15:35:07.622294 IP dcbz.58608 > rhel0x.http: Flags [S], seq 1885340748, win 29200, options [mss 1460,sackOK,TS val 1499839760 ecr 0,nop,wscale 7], length 0
> > > > > 15:35:07.622342 IP rhel0x.http > dcbz.58608: Flags [S.], seq 1948584834, ack 1885340749, win 28960, options [mss 1460,sackOK,TS val 1521845 ecr 1499839760,nop,wscale 7], length 0
> > > > > 15:35:07.622409 IP dcbz.58608 > rhel0x.http: Flags [.], ack 1, win 229, options [nop,nop,TS val 1499839760 ecr 1521845], length 0
> > > > > 15:35:32.268394 IP dcbz.58608 > rhel0x.http: Flags [P.], seq 1:3, ack 1, win 229, options [nop,nop,TS val 1499864406 ecr 1521845], length 2
> > > > > 15:35:32.268433 IP rhel0x.http > dcbz.58608: Flags [R], seq 1948584835, win 0, length 0
> > > > > 
> > > > > https://lisas.de/~adrian/dump.log
> > > > 
> > > > (00.008968) Dumping inet socket at 3
> > > > (00.008972) 	Dumping: ino 0x   16f26 family    2 type    1 port        0 state  7 src_addr 0.0.0.0
> > > > (00.008974) 	Dumped: family 2 type 1 proto 6 port 0 state 7 src_addr 0.0.0.0
> > > > (00.008977) fdinfo: type: 0x 4 flags: 02000002/01 pos: 0x       0 fd: 3
> > > > (00.008991) 10989 fdinfo 4: pos: 0x               0 flags:          2000002/0x1
> > > > (00.008994) 	Searching for socket 16f27 (family 10.6)
> > > > (00.009001) No filter for socket
> > > > (00.009004) Dumping inet socket at 4
> > > > (00.009005) 	Dumping: ino 0x   16f27 family   10 type    1 port       80 state 10 src_addr ::
> > > > (00.009007) 	Dumped: family 10 type 1 proto 6 port 80 state 10 src_addr ::
> > > > 
> > > > I found only two tcp sockets and one has the TCP_LISTEN (10) state
> > > > and another one has the TCP_CLOSED(7) state. I exepect to find
> > > > a socket with the TCP_ESTABLISHED state in the log.
> > > 
> > > Yes, it is interesting that it cannot be seen in the log file.
> > > 
> > > I had a closer look at netstat before and during the dump and I see that my
> > > test method is flawed. Running 'telnet rhel0x 80' puts the TCP connection in
> > > SYN_SENT and only after hitting enter for the first time it is established.
> > > 
> > > Having an actual established connection gives me following error during restore:
> > > 
> > > (00.135695)      7: Error (criu/sk-inet.c:638): Connected TCP socket in image
> > 
> >         if (tcp_connection(ie)) {
> >                 if (!opts.tcp_established_ok) {
> >                         pr_err("Connected TCP socket in image\n");
> >                         goto err;
> >                 }
> > 
> > --tcp-established was not set for "criu restore"
> 
> Now the connections seems to be correctly restored. There seems to be in
> a difference where parameters can be specified on the command-line of
> runc:
> 
> I was using:
> 
>  * runc restore -d rhel7-httpd --tcp-established
> 
> and the container ID needs the last parameter. So that works:
> 
>  * runc restore --tcp-established -d rhel7-httpd
> 
> For 'runc checkpoint' I can specify '--tcp-established' before or after
> the container ID. So that is kind of strange. But now it works for me,
> that's good for now.

I think we need to fix this in runc or file an issue to runc about
this.

> 
> Thanks for your help!

you are welcome!

> 
> 		Adrian


More information about the CRIU mailing list