[CRIU] Patch for the migration feature to change ip

Pavel Emelyanov xemul at parallels.com
Mon Jul 14 06:26:21 PDT 2014


On 07/14/2014 05:19 PM, 孙亚 wrote:
> 
> 
> 
> 
> 
> 
> 
> At 2014-07-14 05:18:58, "Pavel Emelyanov" <xemul at parallels.com> wrote:
>>On 07/13/2014 12:41 PM, 孙亚 wrote:
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> At 2014-07-11 07:09:52, "Pavel Emelyanov" <xemul at parallels.com> wrote:
>>>>On 07/11/2014 10:00 AM, 孙亚 wrote:
>>>>> Hi there:
>>>>>  In order to implement the migration with changing the ip to the target machine where the program migrates
>>>>>  to ,  I add a arg option '-m' in main function in  crtools.c and add some code to limit the -m option only
>>>>>  valid in restore operation. And I add a data member in opts struct defined in cr_option.h.
>>>>> 
>>>>> When the user use the command like this:
>>>>> 
>>>>> '''criu restore -D targetFiles -m 192.168.0.1 --tcp-established '''
>>>>> 
>>>>> the ip will be changed into 192.168.0.1 in function restore_sockaddr in sk-inet.c.
>>>>
>>>>But there can be many sockets, which of them will have the ip changed into 192.168.0.1?
>>> It depends on whether the all sockets to be restored are handled by function resotre_sockaddr.If it is , then
>>> all the sockets will be changed into 192.168.0.1.And according to my test , it is.
>>
>>Well, this is not good as there can be more than one socket in the image.
> Sure.
>>
>>>>
>>>>> Of course , the program will be restored , but the tcp connection will be disconnected because of the changing
>>>>> of the ip. And for the program , there should be error handling code for this scenario.
>>>>
>>>>Maybe it's just better to close the connection while restoring instead of fixing the ip address?
>>> 
>>> Yep, it is what Berkeley C/R system does, which doesn't support the restoring the socket aspect. I guess if we
>>> close the connection during dumping , there should be a complex strategy for restoring the connections ,including:
>>> 1)allowing the user to indicate the IP they want to migrate to 2)allowing the user to decide which IP should be
>>> assigned to which socket. 3) the packages in the queue of current connection will be dropped ,if there is no way 
>>> to redirect them into new connection and make them consistent with the other end of the connection.
>>> I guess , it's difficult to know how to reconnect to the server program with new IPs in CRIU. And in fact , 
>>> the complex step for user is the step 2) , which also could not be avoided if we want to acheive dumping and
>>> restoring tcp connection in CRIU.
>>> At the same time , We could not avoid step 3) either. So what we do now is that we decide to let the developer
>>> of the client program to handle this situation.But before they can use the error handling code to deal with the
>>> situation above , firstly we need to have the program restored.So we changed the IP and restore the program ,which 
>>> will detect the broken connection and do their error handling code,such as reconnecting or exitting, after retoration.
>>> 
>>> In one word , what we do is just give the program a trigger to handle this situation , but what need to do is 
>>> decided by the client program.
>>> By the way , after the discussion by us , we don't think the TCP connection restoration is rational in all
>>> situations , especially migration .Even thoughyou could restore the one end of connection , but if you could 
>>> not restore the other end of the connection , an inconsitent situation between the two ends appears , which will
>>> go against the TCP/IP principle.
>>
>>That's interesting. CRIU indeed doesn't know much details about processes and their connections,
>>but on the other hand CRIU can call custom hooks so that external code could help. What if we
>>put more callbacks into CRIU's TCP/IP sockets dumping and restoring code, so that you could
>>write a plugin with any logic you need to handle that case? The plugins API is in the include/plugin.h
>>and plugin.c files. Currently, there's no hooks for TCP/IP sockets, but if you could propose
>>where to put those (with an example of a plugin) I would gladly merge these changes.
> That's greate. I will do it as soon as possible.

Awesome!

> And in fact I have another question to you , that is , I find if the pid of the process to be restored is already ocuppied by other process (which could happen during migration or restoring locally) , the process to be restored will be assigned a new pid.But because the new pid assigned to the process is different from the original one , the restoration operation will fail, just like this:
> '''
> Error (cr-restore.c:1227): Pid 17047 do not match expected 17046
> Error (cr-restore.c:1036): 17047 exited, status=255
> Error (cr-restore.c:1590): Restoring FAILED.
> '''
> So Whether is it better to provide a choice to user for restoring the process with new pid?

It's possible only in theory. AFAIK glibc caches process' pids and uses them
for some internal needs. So once we change task's pid things in glibc can get
broken. But this deserves individual research.

Try to restore into new pid namespace, this would help with pids mismatch.

Thanks,
Pavel



More information about the CRIU mailing list