[CRIU] criu + threaded program + TCP_REPAIR

Pavel Emelyanov xemul at parallels.com
Tue Oct 14 03:29:41 PDT 2014


On 10/14/2014 02:04 PM, Sowmini Varadhan wrote:
> 
> 
> On (10/14/14 10:34), Pavel Emelyanov wrote:
>     :
>>> 4. Add the server's address on dummy0 on the client
>>>       client# ip link add dummy0 type dummy
>>>       client# ip addr add <srvaddr>  dev dummy0
>>>
>>> 5. Copy the checkpoint files over to the client (duplicate the dir structure)
>>>    and restore
>>
>> First of all, it's not enough to just copy the files. If you want
>> to move a TCP connection you should at least make sure that
>>
>> a) the same IP address as was on source node is available on destination
> 
> yes, you can see that I did that in step 4 (otherwise bind() would fail)

Ah, I've missed that. Then yes, it should work.

>> b) the netfilter rule that CRIU created on dump to lock the connection
>>    exists on the destination (http://criu.org/TCP_connection)
> 
> That web-site seems to say that it should be enough to use
> --tcp-established on both dump and restore, was there something else I
> needed to do? 

The article says " This rule sits in the host netfilter tables after the criu dump
command finishes and it should be there when you issue the criu restore one." But
yes, the docs about TCP are quite obfuscating, we should try to make them better.

>>>     pie: Restoring EXE link
>>>     pie: Restoring scheduler params 0.0.0
>>>     pie: Restoring scheduler params 0.0.0
>>>     pie: Error (pie/restorer.c:351): Thread pid mismatch 10189/10188
>>
>> This means, that the PID of the iperf process is busy on the destination
>> node and CRIU cannot create the process (well, in this case thread) with
>> the same PID as it used to have.
> 
> yes, that might have been my problem. I see 10188  being used
> by another sshd process when I do 'ps -eLf'
> 
>> One of the ways is to run iperf inside the PID namespace. But provided
>> you would also have to somehow manage the IP address, you night also
>> want to use the net namespace too (in this case, btw, the connection
>> locking would work the other way).
> 
> I see. I'm not sure I'd actually need the netns, the dummy interface
> should suffice, no?

With dummy interface the traffic may stop going. If you manage to configure
routing and inform switches that the IP address is on different MAC already,
then OK, but it heavily depends on the actual networking configuration you
have. The simplest way is to use containers.

>> Do you really need to copy the image files on another box? If you just
>> want to play with it it's enough to restore from them on the same box.
> 
> I was actually trying to see if I could get a simplified version
> of live-migration (I see the lxc migration work went in very recently,
> and wanted something that was a little less than bleeding-edge, so
> that I could get over my user-errors first..

Ah, live migration :) You may be interested in the https://github.com/xemul/p.haul
then. It's a wrappers on top of CRIU that put all the bits and pieces together to
make live migration working. It's quite raw and doesn't handle TCP other than in
openvz mode, but maybe it would be a good start with the rest.

Thanks,
Pavel



More information about the CRIU mailing list