[CRIU] [PATCH] restore: handle exit code of the unlock network script

Pavel Emelyanov xemul at parallels.com
Mon Mar 24 15:27:33 PDT 2014


On 03/24/2014 03:07 PM, Andrey Vagin wrote:
> When we are migrating processes from one host to another host,
> we need to know the moment, when processes can be killed on the source
> host.
> If a migration script is killed (segv, exception, etc), the process tree
> must not live on both nodes and we need to reduce the chance of
> killing processes.

I didn't quite get why the existing scheme used by p.haul is flawed.
Can you draw a two-sided diagram of source-destination interaction
and show where the problem is and how you propose to solve it?

> In this patch I suggest to check exit code of network unlock scripts and
> use the following scheme to restore processes:
> 
> 1. The source host dumps processes and stops in the post-dump script.
> 2. Then it sends signal to restore process tree on the other side.
> 3. The destination host restores processes and stops in the
>    network-unlock script.
> 4. Then it sends signal to the source host to kill the origin process
>    tree.
> 5. At the final stage the destination host unlocks network and resumes
>    the migrated process tree.
> 
> We have a small window between the 4-th and 5-th steps, because if a
> migration script is killed in this moment, the process tree will be
> killed on both nodes. And we have zero chance to leave two copies of
> the process tree.
> 
> Signed-off-by: Andrey Vagin <avagin at openvz.org>
> ---
>  cr-restore.c  | 4 +++-
>  include/net.h | 2 +-
>  net.c         | 8 +++++---
>  3 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/cr-restore.c b/cr-restore.c
> index b352daa..22396a2 100644
> --- a/cr-restore.c
> +++ b/cr-restore.c
> @@ -1517,7 +1517,9 @@ static int restore_root_task(struct pstree_item *init)
>  	}
>  
>  	/* Unlock network before disabling repair mode on sockets */
> -	network_unlock();
> +	ret = network_unlock();
> +	if (ret < 0)
> +		goto out_kill;
>  
>  	/*
>  	 * -------------------------------------------------------------
> diff --git a/include/net.h b/include/net.h
> index a9f0d46..a02f98a 100644
> --- a/include/net.h
> +++ b/include/net.h
> @@ -15,7 +15,7 @@ struct veth_pair {
>  };
>  
>  extern int network_lock(void);
> -extern void network_unlock(void);
> +extern int network_unlock(void);
>  
>  extern struct ns_desc net_ns_desc;
>  
> diff --git a/net.c b/net.c
> index ac66374..f7df130 100644
> --- a/net.c
> +++ b/net.c
> @@ -594,15 +594,17 @@ int network_lock(void)
>  	return run_scripts("network-lock");
>  }
>  
> -void network_unlock(void)
> +int network_unlock(void)
>  {
>  	pr_info("Unlock network\n");
>  
> +	if ((current_ns_mask & CLONE_NEWNET) && run_scripts("network-unlock"))
> +		return -1;
> +
>  	cpt_unlock_tcp_connections();
>  	rst_unlock_tcp_connections();
>  
> -	if (current_ns_mask & CLONE_NEWNET)
> -		run_scripts("network-unlock");
> +	return 0;
>  }
>  
>  int veth_pair_add(char *in, char *out)
> 




More information about the CRIU mailing list