[Devel] Re: C/R without "leaks" (was: Re: Creating tasks on restart: userspace vs kernel)

Greg Kurz gkurz at fr.ibm.com
Wed Apr 15 15:42:17 PDT 2009


On Wed, 2009-04-15 at 23:56 +0400, Alexey Dobriyan wrote:
> > Again, so to checkpoint one task in the topmost pid-ns you need to
> > checkpoint (if at all possible) the entire system ?!
> 
> One more argument to not allow "leaks" and checkpoint whole container,
> no ifs, buts and woulditbenices.
> 
> Just to clarify, C/R with "leak" is for example when process has separate
> pidns, but shares, for example, netns with other process not involved in
> checkpoint.
> 
> If you allow this, you lose one important property of checkpoint part,
> namely, almost everything is frozen. Losing this property means suddenly
> much more stuff is alive during dump and you has to account to more stuff
> when checkpointing. You effectively checkpointing on live data structures
> and there is no guarantee you'll get it right.
> 
> Example 1: utsns is shared with the rest of the world.
> 
> utsns content is modifiable only by tasks (current->nsproxy->uts_ns).
> Consequently, someone can modify utsns content while you're dumping it
> if you allow "leaks".
> 
> Did you take precautions? Where?
> 
> 	static int cr_write_utsns(struct cr_ctx *ctx, struct uts_namespace *uts_ns)
> 	{
> 	        struct cr_hdr h;
> 	        struct cr_hdr_utsns *hh;
> 	        int domainname_len;
> 	        int nodename_len;
> 	        int ret;
> 
> 	        h.type = CR_HDR_UTSNS;
> 	        h.len = sizeof(*hh);
> 
> 	        hh = cr_hbuf_get(ctx, sizeof(*hh));
> 	        if (!hh)
> 	                return -ENOMEM;
> 
> 	        nodename_len = strlen(uts_ns->name.nodename) + 1;
> 	        domainname_len = strlen(uts_ns->name.domainname) + 1;
> 
> 	        hh->nodename_len = nodename_len;
> 	        hh->domainname_len = domainname_len;
> 
> 	        ret = cr_write_obj(ctx, &h, hh);
> 	        cr_hbuf_put(ctx, sizeof(*hh));
> 	        if (ret < 0)
> 	                return ret;
> 
> 	        ret = cr_write_string(ctx, uts_ns->name.nodename, nodename_len);
> 	        if (ret < 0)
> 	                return ret;
> 
> 	        ret = cr_write_string(ctx, uts_ns->name.domainname, domainname_len);
> 	        return ret;
> 	}
> 
> You should take uts_sem.
> 
> 
> Example 2: ipcns is shared with the rest of the world
> 
> Consequently, shm segment is visible outside and live. Someone already
> shmatted to it. What will end up in shm segment content? Anything.
> 
> You should check struct file refcount or something and disable attaching
> while dumping or something.
> 
> 
> Moral: Every time you do dump on something live you get complications.
> Every single time.
> 
> 
> There are sockets and live netns as the most complex example. I'm not
> prepared to describe it exactly, but people wishing to do C/R with
> "leaks" should be very careful with their wishes.

They should close their sockets before checkpoint and find/have some way
to reconnect after. This implies some kind of C/R awareness in the code
to be checkpointed.

-- 
Gregory Kurz                                     gkurz at fr.ibm.com
Software Engineer @ IBM/Meiosys                  http://www.ibm.com
Tel +33 (0)534 638 479                           Fax +33 (0)561 400 420

"Anarchy is about taking complete responsibility for yourself."
        Alan Moore.

_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list