[CRIU] [PATCH] restore: handle exit code of the unlock network script

Pavel Emelyanov xemul at parallels.com
Tue Mar 25 06:06:53 PDT 2014


On 03/25/2014 12:41 PM, Andrew Vagin wrote:
> On Tue, Mar 25, 2014 at 02:27:33AM +0400, Pavel Emelyanov wrote:
>> On 03/24/2014 03:07 PM, Andrey Vagin wrote:
>>> When we are migrating processes from one host to another host,
>>> we need to know the moment, when processes can be killed on the source
>>> host.
>>> If a migration script is killed (segv, exception, etc), the process tree
>>> must not live on both nodes and we need to reduce the chance of
>>> killing processes.
>>
>> I didn't quite get why the existing scheme used by p.haul is flawed.
>> Can you draw a two-sided diagram of source-destination interaction
>> and show where the problem is and how you propose to solve it?
> 
> source				destination
> criu dump
> post-dump
> 				criu restore
> 				network unlock
> 				post-restore
> 				kill p.haul before receiving cr_rpc.RESTORE
> resume
> 
> In this case both hosts will have alive process trees...
> 
> And I want to move post-restore before network_unlock, because we can't
> fail after unlocking network.

OK, but this patch does something different.

> 
>>
>>> In this patch I suggest to check exit code of network unlock scripts and
>>> use the following scheme to restore processes:
>>>
>>> 1. The source host dumps processes and stops in the post-dump script.
>>> 2. Then it sends signal to restore process tree on the other side.
>>> 3. The destination host restores processes and stops in the
>>>    network-unlock script.
>>> 4. Then it sends signal to the source host to kill the origin process
>>>    tree.
>>> 5. At the final stage the destination host unlocks network and resumes
>>>    the migrated process tree.
>>>
>>> We have a small window between the 4-th and 5-th steps, because if a
>>> migration script is killed in this moment, the process tree will be
>>> killed on both nodes. And we have zero chance to leave two copies of
>>> the process tree.
>>>
>>> Signed-off-by: Andrey Vagin <avagin at openvz.org>
>>> ---
>>>  cr-restore.c  | 4 +++-
>>>  include/net.h | 2 +-
>>>  net.c         | 8 +++++---
>>>  3 files changed, 9 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/cr-restore.c b/cr-restore.c
>>> index b352daa..22396a2 100644
>>> --- a/cr-restore.c
>>> +++ b/cr-restore.c
>>> @@ -1517,7 +1517,9 @@ static int restore_root_task(struct pstree_item *init)
>>>  	}
>>>  
>>>  	/* Unlock network before disabling repair mode on sockets */
>>> -	network_unlock();
>>> +	ret = network_unlock();
>>> +	if (ret < 0)
>>> +		goto out_kill;
>>>  
>>>  	/*
>>>  	 * -------------------------------------------------------------
>>> diff --git a/include/net.h b/include/net.h
>>> index a9f0d46..a02f98a 100644
>>> --- a/include/net.h
>>> +++ b/include/net.h
>>> @@ -15,7 +15,7 @@ struct veth_pair {
>>>  };
>>>  
>>>  extern int network_lock(void);
>>> -extern void network_unlock(void);
>>> +extern int network_unlock(void);
>>>  
>>>  extern struct ns_desc net_ns_desc;
>>>  
>>> diff --git a/net.c b/net.c
>>> index ac66374..f7df130 100644
>>> --- a/net.c
>>> +++ b/net.c
>>> @@ -594,15 +594,17 @@ int network_lock(void)
>>>  	return run_scripts("network-lock");
>>>  }
>>>  
>>> -void network_unlock(void)
>>> +int network_unlock(void)
>>>  {
>>>  	pr_info("Unlock network\n");
>>>  
>>> +	if ((current_ns_mask & CLONE_NEWNET) && run_scripts("network-unlock"))
>>> +		return -1;
>>> +
>>>  	cpt_unlock_tcp_connections();
>>>  	rst_unlock_tcp_connections();
>>>  
>>> -	if (current_ns_mask & CLONE_NEWNET)
>>> -		run_scripts("network-unlock");
>>> +	return 0;
>>>  }
>>>  
>>>  int veth_pair_add(char *in, char *out)
>>>
>>
>>
> .
> 




More information about the CRIU mailing list