[CRIU] [PATCH 2/2] rst: don't hang when SIGCHLD is coalesced
Andrey Wagin
avagin at gmail.com
Tue Jul 21 10:59:23 PDT 2015
2015-07-21 18:15 GMT+03:00 Tycho Andersen <tycho.andersen at canonical.com>:
> On Tue, Jul 21, 2015 at 12:42:26PM +0300, Andrew Vagin wrote:
>> On Mon, Jul 20, 2015 at 11:56:43PM -0600, Tycho Andersen wrote:
>> > +static int wait_zombies(struct task_restore_args *task_args)
>> > +{
>> > + int i;
>> > +
>> > + for (i = 0; i < task_args->zombies_n; i++) {
>> > + if (sys_waitid(P_PID, task_args->zombies[i], NULL, WNOWAIT | WEXITED, NULL) < 0) {
>> > + pr_err("Wait on %d zombie failed\n", task_args->zombies[i]);
>> > + return -1;
>> > + }
>> > + pr_debug("%ld: Collect a zombie with pid %d\n",
>> > + sys_getpid(), task_args->zombies[i]);
>> > + task_entries->nr_threads--;
>> > + task_entries->nr_tasks--;
>> > + futex_dec_and_wake(&task_entries->nr_in_progress);
>> > + }
>>
>> I think we don't need to decriment counters in the loop:
>> task_entries->nr_threads -= task_args->zombies_n;
>> task_entries->nr_tasks -= task_args->zombies_n;
>
> Yep, sounds good.
>
>> > +
>> > + return 0;
>> > +}
>> > +
>> > /*
>> > * The main routine to restore task via sigreturn.
>> > * This one is very special, we never return there
>> > @@ -836,6 +849,8 @@ long __export_restore_task(struct task_restore_args *args)
>> > task_entries = args->task_entries;
>> > helpers = args->helpers;
>> > n_helpers = args->helpers_n;
>> > + zombies = args->zombies;
>> > + n_zombies = args->zombies_n;
>> > *args->breakpoint = rst_sigreturn;
>> >
>> > ksigfillset(&act.rt_sa_mask);
>> > @@ -1190,11 +1205,10 @@ long __export_restore_task(struct task_restore_args *args)
>> >
>> > pr_info("%ld: Restored\n", sys_getpid());
>> >
>> > - futex_set(&zombies_inprogress, args->nr_zombies);
>> > -
>>
>> We can block SIGCHLD here
>>
>> > restore_finish_stage(CR_STATE_RESTORE);
>> >
>> > - futex_wait_while_gt(&zombies_inprogress, 0);
>> > + if (wait_zombies(args) < 0)
>> > + goto core_restore_end;
>> >
>> > if (wait_helpers(args) < 0)
>> > goto core_restore_end;
>>
>> and unblock SIGCHLD back here
>>
>> In thise case, sigchld_handler will be executed only once for all
>> helpers and zombies. What do you think about this?
>
> That sounds ok, except that if one non-zombie dies, it might be
> coalesced with the zombie, so we would miss that it died and not do
> the futex_abort_and_wake() to handle the non-zombie case.
Yes, You are right.
Actually it's another problem which was here before your changes. We
are going to fix it separate from this one.
Acked-by: Andrew Vagin <avagin at openvz.org>
Thanks,
Andrew
>
> In fact, this can happen with the patch as it is now, but if we made
> the change you suggest it makes it more likely. I'm not sure what the
> best way to handle this case is (other than "don't have any bugs" :D).
>
> Tycho
More information about the CRIU
mailing list