[Devel] Re: [PATCH] user-cr: invoke exit system call directly from ckpt_do_feeder
Oren Laadan
orenl at cs.columbia.edu
Mon Nov 30 23:12:51 PST 2009
Nathan Lynch wrote:
> On Thu, 2009-11-26 at 10:10 -0500, Oren Laadan wrote:
>> Nathan Lynch wrote:
>>> The feeder thread can cause the restart process to fail by indirectly
>>> calling exit_group, which sends SIGKILL to all other threads in the
>>> process. If the feeder thread "wins" the race, the restart is
>>> disrupted. A common symptom of this race is the coordinator task
>>> returning from the wait_for_completion_interruptible in
>>> wait_all_tasks_finish with a signal (the SIGKILL) pending.
>> So the clone mage page says:
>> ...
>> The main use of clone() is to implement threads: multiple threads
>> of control in a program that run concurrently in a shared memory
>> space.
>> ...
>> When the fn(arg) function application returns, the child process
>> terminates. The integer returned by fn is the exit code for the
>> child process. The child process may also terminate explicitly by
>> calling exit(2) or after receiving a fatal signal.
>> ...
>> (http://www.kernel.org/doc/man-pages/online/pages/man2/__clone2.2.html)
>>
>> I expected "terminates" here to mean invoke the syscall _exit().
>> Clearly this is desirable with CLONE_THREAD,
>
> Calling _exit (as glibc's clone support code does) is clearly
> undesirable for CLONE_THREAD users such as restart.c because _exit calls
> exit_group, terminating the whole thread group. That's kind of the
> whole point of the patch :)
I did say "syscall _exit()" to distinguish from libc's _exit()...
Anyway, we agree on the necessity of the patch.
>
>
>> but not for regular
>> processes that will want to proceed to the usual glibc exit path
>> (e.g. process at_exit() and what-not). Then again, the last thread
>> to exit should also call glibc's exit for the same reason. So
>> that's probably why it's handled this way.
>>
>> This matters for us because our user-space wrapper to eclone()
>> should eventually do what the glibc's clone() wrapper does, instead
>> of calling _exit() directly as it is today...
>
> For compatibility's sake, the user-space eclone wrapper should
> eventually do what glibc's clone support code does, yes -- branch to
> _exit. But I think you've stated the case backwards? Currently the
> eclone wrappers call sys_exit directly (e.g. "li r0,__NR_exit; sc" on
> powerpc).
Exactly: currently our wrapper calls the syscall _exit() directly,
while the correct behavior which is expected by the user is the
one provided by libc, that is - call libc's _exit() instead. So we
should fix our wrappers...
Oren.
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list