[Devel] user-cr: Extra unshare() calls ?

Sukadev Bhattiprolu sukadev at linux.vnet.ibm.com
Mon Mar 8 12:13:39 PST 2010


Came across this while testing LXC.


1. Does ckpt_remount_proc() need to unshare() ? Or can we have the
   clone() that calls __ckpt_coordinator() clone with CLONE_NEWNS|CLONE_FS
   instead ?

   The problem with the unshare() in ckpt_remount_proc() is that it
   creates an extra level in cgroup hierarchy (see below) after restart.
   So applications expecting the cgroup hierarchy before chckpoint will
   be surprised.

2. When --mount-pty (or --mntns) is specified, do we need to unshare() 
   in the parent process ? Considering only the full-container restart
   for now (ignore self-restart and subtree restart), can we just
   specify (CLONE_NEWNS|CLONE_FS) at the time of creating the first
   restarted process ?

Here is an example (using LXC) that shows the problems I am running into
Attached is a quick hack to point out the unshare() calls I am referring
to.
   
If I create a simple container with LXC

	$ lxc-execute --name foo --rcfile lxc-macvlan.conf -- /bin/sleep 1000

It creates the following three processes:

	PID   PPID  CMD

        3350  3239  lxc-execute --name foo -- /bin/sleep 1000
        3353  3350  /usr/local/libexec/lxc-init -- /bin/sleep 1000
        3357  3353  /bin/sleep 1000

A new cgroup is created named 'foo' (which is basically a user-space
rename of the pid of the lxc-init). This cgroup is in the root cgroup
directory and has two tasks (lxc-init, sleep)

        $ cat /cgroup/foo/tasks
        3353
        3357

When I checkpoint and restart this container (using the equivalent of
--pidns --pids --mount-pty options to /bin/restart). I get three
processes:

        3434  3375  ./lxc_restart --name bar --statefile=/root/foo.ckpt
        3436  3434  /usr/local/libexec/lxc-init -- /bin/sleep 1000
        3437  3436  /bin/sleep 1000

But the directory in /cgroup referring to lxc-init is 3 levels deep:

	ls /cgroup/3434/3436/1
	cgroup.procs  freezer.state  notify_on_release  tasks

Here is the complete hierarchy created after the restart:

        $ ls -R /cgroup/3434
        /cgroup/3434:
        3436  cgroup.procs  freezer.state  notify_on_release  tasks

        /cgroup/3434/3436:
        1  cgroup.procs  freezer.state  notify_on_release  tasks

        /cgroup/3434/3436/1:
        cgroup.procs  freezer.state  notify_on_release  tasks

        $ cat /cgroup/3434/tasks
        3434

        $ cat /cgroup/3434/3436/tasks   # empty

        $ cat /cgroup/3434/3436/1/tasks
        3436
        3437

I think we get the directory /cgroup/3434 due to the following unshare()

		/* private mounts namespace ? */
		if (args->mntns && unshare(CLONE_NEWNS | CLONE_FS) < 0) {
			ckpt_perror("unshare");
			exit(1);
		}

And we get the "3436/1" directory due to the unshare() in ckpt_remount_proc().

Following hack seems to fix both the levels and the lxc_restart command
correctly creates just the "/cgroup/3436" (which LXC renames to "/cgroup/bar"
cgroup).

---
From: Sukadev Bhattiprolu <sukadev at linux.vnet.ibm.com>
Date: Mon, 8 Mar 2010 12:03:46 -0800
Subject: [PATCH 1/1] Minimize unshare() calls

---
 restart.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/restart.c b/restart.c
index c82de21..6ac51e3 100644
--- a/restart.c
+++ b/restart.c
@@ -459,10 +459,12 @@ int app_restart(struct app_restart_args *args)
 		exit(1);
 
 	/* private mounts namespace ? */
+#if 0
 	if (args->mntns && unshare(CLONE_NEWNS | CLONE_FS) < 0) {
 		ckpt_perror("unshare");
 		exit(1);
 	}
+#endif
 
 	/* chroot ? */
 	if (args->root && chroot(args->root) < 0) {
@@ -717,10 +719,12 @@ static int ckpt_probe_child(pid_t pid, char *str)
  */
 static int ckpt_remount_proc(struct ckpt_ctx *ctx)
 {
+#if 0
 	if (unshare(CLONE_NEWNS | CLONE_FS) < 0) {
 		ckpt_perror("unshare");
 		return -1;
 	}
+#endif
 	/* this is unlikely, but we don't want to fail */
 	if (umount2("/proc", MNT_DETACH) < 0) {
 		if (ckpt_cond_fail(ctx, CKPT_COND_MNTPROC)) {
@@ -778,6 +782,7 @@ static int ckpt_coordinator_pidns(struct ckpt_ctx *ctx)
 	int copy, ret;
 	genstack stk;
 	void *sp;
+	unsigned long flags = SIGCHLD;
 
 	ckpt_dbg("forking coordinator in new pidns\n");
 
@@ -802,7 +807,9 @@ static int ckpt_coordinator_pidns(struct ckpt_ctx *ctx)
 	copy = ctx->args->copy_status;
 	ctx->args->copy_status = 1;
 
-	coord_pid = clone(__ckpt_coordinator, sp, CLONE_NEWPID|SIGCHLD, ctx);
+	flags |= CLONE_NEWPID|CLONE_NEWNS|CLONE_FS;
+
+	coord_pid = clone(__ckpt_coordinator, sp, flags, ctx);
 	genstack_release(stk);
 	if (coord_pid < 0) {
 		ckpt_perror("clone coordinator");
-- 
1.6.6.1

_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list