[Devel] Re: [RFC][v3][PATCH 7/7] Define clone_with_pids syscall
Oren Laadan
orenl at cs.columbia.edu
Mon Jun 1 09:58:15 PDT 2009
Oren Laadan wrote:
>
> Serge E. Hallyn wrote:
>> Quoting Sukadev Bhattiprolu (sukadev at linux.vnet.ibm.com):
>>
>> Bear with me as I try to get this straight...
>>
>> Let's say we have the following task being checkpointed (btw, the
>> actual checkpoint code hopefully will, as Oren mentioned, not
>> be always checkpointing all of a task's pids, but only the
>> pids from the checkpointer's pidns level an up - but that doesn't
>> matter to you):
>>
>> Either:
>>
>> level no | pid
>> 0 1001
>> 1 50
>> 2 3
>>
>> or
>>
>> level no | pid
>> 0 1001
>> 1 50
>> 2 1
>>
>> Now we want to restart that inside a container. So for the
>> first task we have a parent task:
>>
>> level no | pid
>> 0 5009
>> 1 1000
>> 2 49
>> 3 2
>>
>> calling clone_with_pid with upids {1001,50,3} to produce task:
>>
>> level no | pid
>> 0 5010
>> 1 1001
>> 2 50
>> 3 3
>>
>> So the numbers in your code for this case become:
>>
>>> + unum_pids = pid_set.num_pids;
>> unum_pids = 3
>>> + knum_pids = task_pid(current)->level + 1;
>> knum_pids = 3 + 1 = 4
>>
>>> + target_pids = kzalloc((knum_pids + 1) * sizeof(pid_t), GFP_KERNEL);
>> XXX First problem:
>> target_pids gets room for 5 pids, one more than we need.
>
> The two issues are related, and this is intentional. The idea
> was that when you use CLONE_NEWPID, you imply a new nesting level
> and a pid==1 there. So you are not supposed to specify the pid
> of that new level.
>
> IOW, the parent specify the pid's of the child from the _parent's_
> level and up (of the desired depth). CLONE_NEWPID creates a new
> pidns level below the parent's, and that is not covered in the
> array of pids.
>
> By allocation an extra slot and forcing it to be 0, we ensure that
> the case of CLONE_NEWPID is covered correctly. Clearly, if this
> flag isn't set, then the extra slot is redundant (but doesn't hurt).
>
>>> + j = knum_pids - unum_pids;
>> j = 1, so we copy the 3 pids in the right place.
>>
>>> + rc = copy_from_user(&target_pids[j], pid_set.target_pids, size);
>>> + if (rc) {
>>> + rc = -EFAULT;
>>> + goto out_free;
>>> + }
>>> +
>>> + return target_pids;
>>
>> For the second one, we have a parent task
>>
>> level no | pid
>> 0 5009
>> 1 1000
>> 2 49
>>
>> calling clone_with_pid with CLONE_NEWPID and {1001,50,1} to produce:
>>
>> level no | pid
>> 0 5010
>> 1 1001
>> 2 50
>> 3 1
>>
>> So the numbers in your code become:
>>
>>> + unum_pids = pid_set.num_pids;
>> unum_pids = 3
>
> This is a "bug" of the parent. The parent should specify the pids
> from the parent's level only and up, and not include the new level
> below that will be created. (After all, it will have to be 1).
Suka: maybe state this explicitly,too, in the patch description
and in the comments in the code ? Like:
"The syscall operates relative to the parent's nesting level. If
CLONE_NEWPID is used as well, it will create an additional level
below. In this case the syscall indicates the desired pids at the
nesting levels seen by the parent."
Oren.
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list