[Devel] Re: [PATCH 1/1] cr: define CHECKPOINT_SUBTREE flag and sysctl

Mon Apr 27 13:11:45 PDT 2009

Serge E. Hallyn wrote:
> Quoting Nathan Lynch (ntl at pobox.com):
>> "Serge E. Hallyn" <serge at hallyn.com> writes:
>>> Quoting Nathan Lynch (ntl at pobox.com):
>>>> "Serge E. Hallyn" <serue at us.ibm.com> writes:
>>>>> +	cnt = ref->users + 1;
>>>>> +	switch (ref->type) {
>>>>> +	case CR_OBJ_UTSNS:
>>>>> +		utsns = ref->ptr;
>>>>> +		cnt2 = (unsigned long) atomic_read(&utsns->kref.refcount);
>>>>> +		if (cnt != cnt2) {
>>>>> +			cr_debug("uts namespace leak\n");
>>>> I'm struggling to understand what guarantee a check such as this is
>>>> supposed to be making.  I see that it will catch *some* undesirable
>>>> cases.  But "current refcount equals old refcount" does not imply that
>>>> "refcount has not changed in the meantime".
>>> It's got nothing to do with the refcounts changing.
>>>
>>> It ensures that, at the end of the checkpoint, the resources (utsns
>>> in this case) had no users not accounted for by a checkpointed task.
>>> In other words, there was no information leak.
>> Okay, I had mistakenly believed this code was running in the
>> subtree/non-container case.  I reread your patch description and it
>> indicates that these checks are made only in the case of container
>> checkpoint.  If I'm (finally) understanding the patch correctly, my
>> concern is lessened.  Comparing refcounts is still... unconventional.
> 
> Yes, and there are cases where it won't be usable - for instance if
> opening a procfile increments the resource->use count.  That should
> not be an issue for utsns, ipcns, files, or vmas, afaik.

Actually, one such case is if you have a FIFO - and a task outside the
"container" (for whatever definition we choose) opens that FIFO because
the right thingie is mounted in its (distinct) mounts namespace.

Also, unsure if unix domain sockets (those visible through the file
system, not the "abstract" type) are otherwise isolated as well ?

> 
>>> Now it's possible that at the *start* of the checkpoint there was
>>> another task, not being checkpointed and not frozen, in the utsns,
>>> and it exited before the leaks check took place.
>> [Please excuse the obtuse queries below]
>>
>> In which case the check would fail, yes?  Can this scenario actually
>> occur?  I'm of the understanding that a container must be frozen before
>> proceeding with checkpoint.  If that's correct, how could a task in the
>> container exit in the meantime?
> 
> Heh, because there is no such thing as a 'container'.  There is a set of
> tasks in the same freezer control group, and it's possible that there is
> a task not in that cgroup which is in the same utsname as the rest of
> the tasks in that freezer cgroup.
> 
> Now from my POV the important thing is that:  (1) when userspace does
> the right thing, then it gets the right thing, and (2) when userspace
> does the wrong thing, the kernel doesn't crash.  This goes beyond that,
> aiming for something which probably isn't fully achievable, but still
> may be useful even if only mostly achievable:  namely, warn the user
> when they do the wrong thing, in the believe that not doing so makes
> the implementation "a toy."  While I don't agree with that analysis,
> in the end the only person whose opinion on that matters is the user's :)
> So I'm not opposed to giving them the option.

Amen.

Oren.

_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers