[CRIU] [PATCH] restore: handle exit code of the unlock network script

Andrew Vagin avagin at parallels.com
Tue Mar 25 06:13:17 PDT 2014


On Tue, Mar 25, 2014 at 05:06:53PM +0400, Pavel Emelyanov wrote:
> On 03/25/2014 12:41 PM, Andrew Vagin wrote:
> > On Tue, Mar 25, 2014 at 02:27:33AM +0400, Pavel Emelyanov wrote:
> >> On 03/24/2014 03:07 PM, Andrey Vagin wrote:
> >>> When we are migrating processes from one host to another host,
> >>> we need to know the moment, when processes can be killed on the source
> >>> host.
> >>> If a migration script is killed (segv, exception, etc), the process tree
> >>> must not live on both nodes and we need to reduce the chance of
> >>> killing processes.
> >>
> >> I didn't quite get why the existing scheme used by p.haul is flawed.
> >> Can you draw a two-sided diagram of source-destination interaction
> >> and show where the problem is and how you propose to solve it?
> > 
> > source				destination
> > criu dump
> > post-dump
> > 				criu restore
> > 				network unlock
> > 				post-restore
> > 				kill p.haul before receiving cr_rpc.RESTORE
> > resume
> > 
> > In this case both hosts will have alive process trees...
> > 
> > And I want to move post-restore before network_unlock, because we can't
> > fail after unlocking network.
> 
> OK, but this patch does something different.

No, it doesn't. It doesn't move post-restore, it will be done in another
patch. But network_unlock is a line after which the tree can't be
resumed on the source host.

> 
> > 
> >>
> >>> In this patch I suggest to check exit code of network unlock scripts and
> >>> use the following scheme to restore processes:
> >>>
> >>> 1. The source host dumps processes and stops in the post-dump script.
> >>> 2. Then it sends signal to restore process tree on the other side.
> >>> 3. The destination host restores processes and stops in the
> >>>    network-unlock script.
> >>> 4. Then it sends signal to the source host to kill the origin process
> >>>    tree.
> >>> 5. At the final stage the destination host unlocks network and resumes
> >>>    the migrated process tree.
> >>>
> >>> We have a small window between the 4-th and 5-th steps, because if a
> >>> migration script is killed in this moment, the process tree will be
> >>> killed on both nodes. And we have zero chance to leave two copies of
> >>> the process tree.
> >>>
> >>> Signed-off-by: Andrey Vagin <avagin at openvz.org>
> >>> ---
> >>>  cr-restore.c  | 4 +++-
> >>>  include/net.h | 2 +-
> >>>  net.c         | 8 +++++---
> >>>  3 files changed, 9 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/cr-restore.c b/cr-restore.c
> >>> index b352daa..22396a2 100644
> >>> --- a/cr-restore.c
> >>> +++ b/cr-restore.c
> >>> @@ -1517,7 +1517,9 @@ static int restore_root_task(struct pstree_item *init)
> >>>  	}
> >>>  
> >>>  	/* Unlock network before disabling repair mode on sockets */
> >>> -	network_unlock();
> >>> +	ret = network_unlock();
> >>> +	if (ret < 0)
> >>> +		goto out_kill;
> >>>  
> >>>  	/*
> >>>  	 * -------------------------------------------------------------
> >>> diff --git a/include/net.h b/include/net.h
> >>> index a9f0d46..a02f98a 100644
> >>> --- a/include/net.h
> >>> +++ b/include/net.h
> >>> @@ -15,7 +15,7 @@ struct veth_pair {
> >>>  };
> >>>  
> >>>  extern int network_lock(void);
> >>> -extern void network_unlock(void);
> >>> +extern int network_unlock(void);
> >>>  
> >>>  extern struct ns_desc net_ns_desc;
> >>>  
> >>> diff --git a/net.c b/net.c
> >>> index ac66374..f7df130 100644
> >>> --- a/net.c
> >>> +++ b/net.c
> >>> @@ -594,15 +594,17 @@ int network_lock(void)
> >>>  	return run_scripts("network-lock");
> >>>  }
> >>>  
> >>> -void network_unlock(void)
> >>> +int network_unlock(void)
> >>>  {
> >>>  	pr_info("Unlock network\n");
> >>>  
> >>> +	if ((current_ns_mask & CLONE_NEWNET) && run_scripts("network-unlock"))
> >>> +		return -1;
> >>> +
> >>>  	cpt_unlock_tcp_connections();
> >>>  	rst_unlock_tcp_connections();
> >>>  
> >>> -	if (current_ns_mask & CLONE_NEWNET)
> >>> -		run_scripts("network-unlock");
> >>> +	return 0;
> >>>  }
> >>>  
> >>>  int veth_pair_add(char *in, char *out)
> >>>
> >>
> >>
> > .
> > 
> 
> 


More information about the CRIU mailing list