[CRIU] criu + threaded program + TCP_REPAIR

Pavel Emelyanov xemul at parallels.com
Mon Oct 13 23:34:53 PDT 2014


On 10/13/2014 07:11 PM, Sowmini Varadhan wrote:
> Hello,

Hi!

> I wanted to observe the TCP_REPAIR code in action, to get a better
> understanding of it. So I tried the following in a qemu-kvm
> env, but I'm running into some errors- not sure if I'm hitting something
> that is not supported yet, or if I have some user-error.
> 
> 1. Start the server in vm1.
>      server# nohup iperf -s &
> 
> 2. start the client, and set up parameters so that it runs for a long time
>       client# iperf -c <srvaddr> -P 10 -t 900
> 
> 3. Checkpoint the server
>       server# criu dump -D .  -t 10186 --shell-job --tcp-established --shel-job
> 
> 4. Add the server's address on dummy0 on the client
>       client# ip link add dummy0 type dummy
>       client# ip addr add <srvaddr>  dev dummy0
> 
> 5. Copy the checkpoint files over to the client (duplicate the dir structure)
>    and restore

First of all, it's not enough to just copy the files. If you want
to move a TCP connection you should at least make sure that

a) the same IP address as was on source node is available on destination
b) the netfilter rule that CRIU created on dump to lock the connection
   exists on the destination (http://criu.org/TCP_connection)

>        client# criu restore -v4 --tree 10186 --images-dir /root/images \
>                          --tcp-established --shell-job
> 
> I get the error
> 
>      :
>     pie: Restoring EXE link
>     pie: Restoring scheduler params 0.0.0
>     pie: Restoring scheduler params 0.0.0
>     pie: Error (pie/restorer.c:351): Thread pid mismatch 10189/10188

This means, that the PID of the iperf process is busy on the destination
node and CRIU cannot create the process (well, in this case thread) with
the same PID as it used to have.

>     pie: Restoring scheduler params 0.0.0
>     pie: Error (pie/restorer.c:392): Restorer abnormal termination for 101>
>     pie: 86
>     (00.138423) Error (cr-restore.c:1812): Restoring FAILED.
> 
> I suspect this is being triggered by the fact that iperf is threaded-
> am I right? What't the correct way to get this to restore?

One of the ways is to run iperf inside the PID namespace. But provided
you would also have to somehow manage the IP address, you night also
want to use the net namespace too (in this case, btw, the connection
locking would work the other way).

Do you really need to copy the image files on another box? If you just
want to play with it it's enough to restore from them on the same box.

Thanks,
Pavel



More information about the CRIU mailing list