[CRIU] [PATCH] mount: restore mounts in the root mount namespace

Andrei Vagin avagin at virtuozzo.com
Wed Feb 15 11:25:26 PST 2017


On Wed, Feb 15, 2017 at 10:17:53PM +0300, Pavel Emelyanov wrote:
> On 02/15/2017 09:53 PM, Andrei Vagin wrote:
> > On Wed, Feb 15, 2017 at 01:26:52PM +0300, Pavel Emelyanov wrote:
> >> On 02/15/2017 12:49 AM, Andrei Vagin wrote:
> >>> From: Andrei Vagin <avagin at virtuozzo.com>
> >>>
> >>> Currently mounts are restored in a mount namespace, then
> >>> this namespace is cloned to create mount namespaces for
> >>> processes and finaly the first namespace is destroyed after
> >>> cleaning remap files.
> >>>
> >>> But it doesn't work in a case, when we have to open file
> >>> descriptores to restore mounts (e g to restore bind-mount
> >>> pty slaves), because on the next iteration criu will not
> >>> find mounts for this files.
> >>
> >> I don't get this. Would you describe the problem in more detail?
> > 
> > Look at this example:
> > 
> > The origin mount namespace has a bind mount /dev/pts/0 -> /dev/console
> > 
> > And one of processes has /dev/ptmx which is linked with /dev/pts/0
> > 
> > How we restore this construction:
> > 1. Fork the init process into a new mount namespace (lets call it A)
> > 2. Restore mounts
> >    - open /dev/ptmx to bind-mount /dev/pts/0 and save the master fd in
> >      fdstore
> >    - bind-mount /dev/pts/0 into /dev/console
> > 3. Create namespace which will be installed to processes
> 
> Let's call it B :)
> 
> > 4. Restore file descriptros
> >    - get the master fd for /dev/pts/0 from fdstore
> > 
> > This patch change the third step. You can see that we open a file
> > desctriptor in the A namespace, so if we don't intall it to one of
> > processes, it will be extern next time, becase criu will not be able to
> > find its mount by mntid.
> 
> So the problem I see from this description is that in fdstore we keep
> the descriptor for A, while should for B. Is that correct?

or A should be installed as a restore namespace, what I do in this
patch.

> 
> > Currently CRIU uses the A namespace to clean up remap files and clone
> > this namespace to restore namespaces for processes.
> > 
> > With this patch, the A namespace will be used as a root mount namespace
> > and we will clone it to create a mount namespace, which will be used to
> > clean up remap files.
> > 
> >>
> >>> In this patch the first namespace is restored as the root
> >>> mount namespace and we clone the first namespace to create
> >>> a namespace whichc is used to clean remap files.
> >>>
> >>> $ ./zdtm.py run -t zdtm/static/pty-console --iter 5
> >>> ====================== Run zdtm/static/pty-console in ns =======================
> >>> Start test
> >>> Test is SUID
> >>> ./pty-console --pidfile=pty-console.pid --outfile=pty-console.out
> >>> Run criu dump
> >>> Run criu restore
> >>> Run criu dump
> >>> =[log]=> dump/zdtm/static/pty-console/36/2/dump.log
> >>> ------------------------ grep Error ------------------------
> >>> (00.106521) Error (criu/files-reg.c:1132): Can't lookup mount=563 for fd=4 path=/ptmx
> >>> (00.106585) Error (criu/cr-dump.c:1325): Dump files (pid: 70) failed with -1
> >>> (00.129041) Error (criu/cr-dump.c:1674): Dumping FAILED.
> >>> ------------------------ ERROR OVER ------------------------
> >>>
> >>> Signed-off-by: Andrei Vagin <avagin at virtuozzo.com>
> >>> ---
> >>>  criu/cr-restore.c       | 21 +++++++++++++++++----
> >>>  criu/include/restorer.h |  5 +++++
> >>>  criu/mount.c            | 32 +++++++++++++++++++++++++++-----
> >>>  3 files changed, 49 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/criu/cr-restore.c b/criu/cr-restore.c
> >>> index d75aa0d..e20036b 100644
> >>> --- a/criu/cr-restore.c
> >>> +++ b/criu/cr-restore.c
> >>> @@ -1431,6 +1431,9 @@ static int restore_task_with_children(void *_arg)
> >>>  		if (prepare_namespace(current, ca->clone_flags))
> >>>  			goto err;
> >>>  
> >>> +		if (restore_finish_stage(task_entries, CR_STATE_POST_RESTORE_NS) < 0)
> >>> +			goto err;
> >>> +
> >>>  		if (root_prepare_shared())
> >>>  			goto err;
> >>>  
> >>> @@ -1494,6 +1497,7 @@ static inline int stage_participants(int next_stage)
> >>>  	case CR_STATE_FAIL:
> >>>  		return 0;
> >>>  	case CR_STATE_RESTORE_NS:
> >>> +	case CR_STATE_POST_RESTORE_NS:
> >>>  	case CR_STATE_RESTORE_SHARED:
> >>>  		return 1;
> >>>  	case CR_STATE_FORKING:
> >>> @@ -1850,16 +1854,25 @@ static int restore_root_task(struct pstree_item *init)
> >>>  	if (ret)
> >>>  		goto out_kill;
> >>>  
> >>> +	ret = run_scripts(ACT_SETUP_NS);
> >>> +	if (ret)
> >>> +		goto out_kill;
> >>> +
> >>> +	ret = restore_switch_stage(CR_STATE_POST_RESTORE_NS);
> >>> +	if (ret < 0)
> >>> +		goto out_kill;
> >>> +
> >>> +	pr_info("Wait until namespaces are created\n");
> >>> +	ret = restore_wait_inprogress_tasks();
> >>> +	if (ret)
> >>> +		goto out_kill;
> >>> +
> >>>  	if (root_ns_mask & CLONE_NEWNS) {
> >>>  		mnt_ns_fd = open_proc(init->pid->real, "ns/mnt");
> >>>  		if (mnt_ns_fd < 0)
> >>>  			goto out_kill;
> >>>  	}
> >>>  
> >>> -	ret = run_scripts(ACT_SETUP_NS);
> >>> -	if (ret)
> >>> -		goto out_kill;
> >>> -
> >>>  	if (opts.empty_ns & CLONE_NEWNET) {
> >>>  		/*
> >>>  		 * Local TCP connections were locked by network_lock_internal()
> >>> diff --git a/criu/include/restorer.h b/criu/include/restorer.h
> >>> index 23c7e24..7c65538 100644
> >>> --- a/criu/include/restorer.h
> >>> +++ b/criu/include/restorer.h
> >>> @@ -203,6 +203,11 @@ static inline unsigned long restorer_stack(struct restore_mem_zone *mz)
> >>>  enum {
> >>>  	CR_STATE_FAIL		= -1,
> >>>  	CR_STATE_RESTORE_NS	= 0, /* is used for executing "setup-namespace" scripts */
> >>> +	/*
> >>> +	 * Need to wait a mount namespace which
> >>> +	 * will be used to clean up remap files.
> >>> +	 */
> >>> +	CR_STATE_POST_RESTORE_NS,
> >>>  	CR_STATE_RESTORE_SHARED,
> >>>  	CR_STATE_FORKING,
> >>>  	CR_STATE_RESTORE,
> >>> diff --git a/criu/mount.c b/criu/mount.c
> >>> index 8f66b4e..013b17e 100644
> >>> --- a/criu/mount.c
> >>> +++ b/criu/mount.c
> >>> @@ -2948,17 +2948,39 @@ int prepare_mnt_ns(void)
> >>>  			goto err;
> >>>  		}
> >>>  
> >>> +		if (nsid->type == NS_ROOT) {
> >>> +			int fd;
> >>> +
> >>> +			/*
> >>> +			 * We need to create a mount namespace which will be
> >>> +			 * used to clean up remap files
> >>> +			 * (depopulate_roots_yard).  The namespace where mounts
> >>> +			 * was restored has to be restored as a root mount
> >>> +			 * namespace, because there are file descriptors
> >>> +			 * linked with it (e.g. to bind-mount slave pty-s).
> >>> +			 */
> >>> +			fd = open_proc(PROC_SELF, "ns/mnt");
> >>> +			if (fd < 0)
> >>> +				goto err;
> >>> +			if (setns(rst, CLONE_NEWNS)) {
> >>> +				pr_perror("Can't restore mntns back");
> >>> +				goto err;
> >>> +			}
> >>> +			nsid->mnt.ns_fd = rst;
> >>> +			rst = fd;
> >>> +		} else {
> >>> +			/* Pin one with a file descriptor */
> >>> +			nsid->mnt.ns_fd = open_proc(PROC_SELF, "ns/mnt");
> >>> +			if (nsid->mnt.ns_fd < 0)
> >>> +				goto err;
> >>> +		}
> >>> +
> >>>  		/* Set its root */
> >>>  		path[0] = '/';
> >>>  		print_ns_root(nsid, 0, path + 1, sizeof(path) - 1);
> >>>  		if (cr_pivot_root(path))
> >>>  			goto err;
> >>>  
> >>> -		/* Pin one with a file descriptor */
> >>> -		nsid->mnt.ns_fd = open_proc(PROC_SELF, "ns/mnt");
> >>> -		if (nsid->mnt.ns_fd < 0)
> >>> -			goto err;
> >>> -
> >>>  		/* root_fd is used to restore file mappings */
> >>>  		nsid->mnt.root_fd = open_proc(PROC_SELF, "root");
> >>>  		if (nsid->mnt.root_fd < 0)
> >>>
> >>
> > .
> > 
> 


More information about the CRIU mailing list