[Devel] Re: Testing lxc 0.6.5 in Fedora 13
Eric W. Biederman
ebiederm at xmission.com
Tue Apr 6 08:13:13 PDT 2010
Matt Helsley <matthltc at us.ibm.com> writes:
> On Mon, Apr 05, 2010 at 08:44:43PM -0700, Roland McGrath wrote:
>> (I've been away for a couple of weeks.)
>> I concur with the things Oleg's said in this thread.
>>
>> As to what's "correct" for the kernel in theory, it would certainly make
>> sense to clean up the ptrace cases to use the tracer (parent) pid_ns when
>> reporting any PID as such. The wait and SIGCHLD code already does this, so
>> that would be consistent. Off hand I don't see anything other than
>> tracehook_report_clone{,_complete}() that sees the wrong value now.
>
> Yup.
>
>> Fixing that requires a bit of hair. The simple and clean approach is to
>> just have the tracehook calls (i.e. ptrace layer) extract the PID from the
>> task_struct using the desired pid_ns. The trouble there is that the
>
> It's also possible to take an extra reference to the struct pid and pass
> that to the tracehook. That and the pid_ns of the tracer receiving the pid
> is all we'll ever need inside the tracehook layer. The only advantage, I
> think, is we wouldn't pin the task struct while holding the pid reference.
>
>> tracehook_report_clone_complete() call is made when that task_struct is no
>> longer guaranteed to be valid. The contrary approach of extracting the
>> appropriate value for the tracer earlier breaks the clean layering because
>> the fork.c code really should not know at all that ->parent->nsproxy is the
>> place to look for what values to pass to tracehook calls. I guess the
>> simple and clean fix is to get_task_struct() before wake_up_new_task()
>> and put_task_struct() after tracehook_report_clone_complete(). That does
>> add some gratuitous atomic incr/decr overhead, though.
>
> Also true.
>
> Of course my suggestion of holding the pid reference won't avoid adding
> atomic ops -- just changes which refcount they work on.
>
>>
>> None of this has much of anything to do with strace, of course. As I've
>> said, I don't see anything other than the PTRACE_GETEVENTMSG value for
>> PTRACE_EVENT_{CLONE,FORK,VFORK} reports that is wrong in the kernel. As
>> Oleg said, strace doesn't use that at all. (This is not the place to
>> discuss the details of strace further.)
>
> Also, looking at proposed changes (utrace and Eric Biederman's setns())
> storing a pid nr rather than a reference to a task struct or struct pid
> probably won't be correct.
My setns work has demonstrated that even for entering a namespace we
never ever need to change the struct pid of a task. setns has no other
bearing on this problem then to say there is no foreseeable reason to
change the rules.
> In the case of Eric Biederman's setns(), if capable of changing pid namespace,
> we could have:
>
> Traced Tracer
> fork()
> ... (an arbitrary amount of time passes)
> setns()
> ptrace(GETEVENTMSG)
Forget that. The pid namespace was architected so that we can ptrace a process
in another pid namespace.
> At which point returning a static pid number held in the message field
> produces the wrong pid.
No. A processes always sees pids from the context of it's original pid
namespace. All setns does is affect which pid namespace children will
be native in.
Eric
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list