[Devel] Re: [PATCH 1/1] cr: define CHECKPOINT_SUBTREE flag and sysctl

Mon Apr 27 11:07:17 PDT 2009

Quoting Nathan Lynch (ntl at pobox.com):
> "Serge E. Hallyn" <serge at hallyn.com> writes:
> > Quoting Nathan Lynch (ntl at pobox.com):
> >> "Serge E. Hallyn" <serue at us.ibm.com> writes:
> >> > +	cnt = ref->users + 1;
> >> > +	switch (ref->type) {
> >> > +	case CR_OBJ_UTSNS:
> >> > +		utsns = ref->ptr;
> >> > +		cnt2 = (unsigned long) atomic_read(&utsns->kref.refcount);
> >> > +		if (cnt != cnt2) {
> >> > +			cr_debug("uts namespace leak\n");
> >> 
> >> I'm struggling to understand what guarantee a check such as this is
> >> supposed to be making.  I see that it will catch *some* undesirable
> >> cases.  But "current refcount equals old refcount" does not imply that
> >> "refcount has not changed in the meantime".
> >
> > It's got nothing to do with the refcounts changing.
> >
> > It ensures that, at the end of the checkpoint, the resources (utsns
> > in this case) had no users not accounted for by a checkpointed task.
> > In other words, there was no information leak.
> 
> Okay, I had mistakenly believed this code was running in the
> subtree/non-container case.  I reread your patch description and it
> indicates that these checks are made only in the case of container
> checkpoint.  If I'm (finally) understanding the patch correctly, my
> concern is lessened.  Comparing refcounts is still... unconventional.

Yes, and there are cases where it won't be usable - for instance if
opening a procfile increments the resource->use count.  That should
not be an issue for utsns, ipcns, files, or vmas, afaik.

> > Now it's possible that at the *start* of the checkpoint there was
> > another task, not being checkpointed and not frozen, in the utsns,
> > and it exited before the leaks check took place.
> 
> [Please excuse the obtuse queries below]
> 
> In which case the check would fail, yes?  Can this scenario actually
> occur?  I'm of the understanding that a container must be frozen before
> proceeding with checkpoint.  If that's correct, how could a task in the
> container exit in the meantime?

Heh, because there is no such thing as a 'container'.  There is a set of
tasks in the same freezer control group, and it's possible that there is
a task not in that cgroup which is in the same utsname as the rest of
the tasks in that freezer cgroup.

Now from my POV the important thing is that:  (1) when userspace does
the right thing, then it gets the right thing, and (2) when userspace
does the wrong thing, the kernel doesn't crash.  This goes beyond that,
aiming for something which probably isn't fully achievable, but still
may be useful even if only mostly achievable:  namely, warn the user
when they do the wrong thing, in the believe that not doing so makes
the implementation "a toy."  While I don't agree with that analysis,
in the end the only person whose opinion on that matters is the user's :)
So I'm not opposed to giving them the option.

-serge
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers