[CRIU] [PATCH] restore: handle exit code of the unlock network script
Andrew Vagin
avagin at parallels.com
Tue Mar 25 01:41:30 PDT 2014
On Tue, Mar 25, 2014 at 02:27:33AM +0400, Pavel Emelyanov wrote:
> On 03/24/2014 03:07 PM, Andrey Vagin wrote:
> > When we are migrating processes from one host to another host,
> > we need to know the moment, when processes can be killed on the source
> > host.
> > If a migration script is killed (segv, exception, etc), the process tree
> > must not live on both nodes and we need to reduce the chance of
> > killing processes.
>
> I didn't quite get why the existing scheme used by p.haul is flawed.
> Can you draw a two-sided diagram of source-destination interaction
> and show where the problem is and how you propose to solve it?
source destination
criu dump
post-dump
criu restore
network unlock
post-restore
kill p.haul before receiving cr_rpc.RESTORE
resume
In this case both hosts will have alive process trees...
And I want to move post-restore before network_unlock, because we can't
fail after unlocking network.
>
> > In this patch I suggest to check exit code of network unlock scripts and
> > use the following scheme to restore processes:
> >
> > 1. The source host dumps processes and stops in the post-dump script.
> > 2. Then it sends signal to restore process tree on the other side.
> > 3. The destination host restores processes and stops in the
> > network-unlock script.
> > 4. Then it sends signal to the source host to kill the origin process
> > tree.
> > 5. At the final stage the destination host unlocks network and resumes
> > the migrated process tree.
> >
> > We have a small window between the 4-th and 5-th steps, because if a
> > migration script is killed in this moment, the process tree will be
> > killed on both nodes. And we have zero chance to leave two copies of
> > the process tree.
> >
> > Signed-off-by: Andrey Vagin <avagin at openvz.org>
> > ---
> > cr-restore.c | 4 +++-
> > include/net.h | 2 +-
> > net.c | 8 +++++---
> > 3 files changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/cr-restore.c b/cr-restore.c
> > index b352daa..22396a2 100644
> > --- a/cr-restore.c
> > +++ b/cr-restore.c
> > @@ -1517,7 +1517,9 @@ static int restore_root_task(struct pstree_item *init)
> > }
> >
> > /* Unlock network before disabling repair mode on sockets */
> > - network_unlock();
> > + ret = network_unlock();
> > + if (ret < 0)
> > + goto out_kill;
> >
> > /*
> > * -------------------------------------------------------------
> > diff --git a/include/net.h b/include/net.h
> > index a9f0d46..a02f98a 100644
> > --- a/include/net.h
> > +++ b/include/net.h
> > @@ -15,7 +15,7 @@ struct veth_pair {
> > };
> >
> > extern int network_lock(void);
> > -extern void network_unlock(void);
> > +extern int network_unlock(void);
> >
> > extern struct ns_desc net_ns_desc;
> >
> > diff --git a/net.c b/net.c
> > index ac66374..f7df130 100644
> > --- a/net.c
> > +++ b/net.c
> > @@ -594,15 +594,17 @@ int network_lock(void)
> > return run_scripts("network-lock");
> > }
> >
> > -void network_unlock(void)
> > +int network_unlock(void)
> > {
> > pr_info("Unlock network\n");
> >
> > + if ((current_ns_mask & CLONE_NEWNET) && run_scripts("network-unlock"))
> > + return -1;
> > +
> > cpt_unlock_tcp_connections();
> > rst_unlock_tcp_connections();
> >
> > - if (current_ns_mask & CLONE_NEWNET)
> > - run_scripts("network-unlock");
> > + return 0;
> > }
> >
> > int veth_pair_add(char *in, char *out)
> >
>
>
More information about the CRIU
mailing list