[CRIU] [PATCH 1/2] restore: Fix deadlock when helper's child dies
Dmitry Safonov
dsafonov at virtuozzo.com
Thu Jul 6 18:58:59 MSK 2017
On 07/06/2017 05:53 PM, Kirill Tkhai wrote:
> On Wed, Jul 05, 2017 at 23:21, Dmitry Safonov wrote:
>> Since commit ced9c529f687 ("restore: fix race with helpers' kids dying
>> too early"), we block SIGCHLD in helper tasks before CR_STATE_RESTORE.
>> This way we avoided default criu sighandler as it doesn't expect that
>> childs may die.
>>
>> This is very racy as we wait on futex for another stage to be started,
>> but the next stage may start only when all the tasks complete previous
>> stage. If some children of helper dies, the helper may already have
>> blocked SIGCHLD and have started sleeping on the futex. Then the next
>> stage never comes and no one reads a pending SIGCHLD for helper.
>>
>> A customer met this situation on the node, where the following
>> (non-related) problem has occured:
>> Unable to send a fin packet: libnet_write_raw_ipv6(): -1 bytes written (Network is unreachable)
>> Then child criu of the helper has exited with error-code and the
>> lockup has happened.
>>
>> While we could fix it by aborting futex in the end of
>> restore_task_with_children() for each (non-root also) tasks,
>> that would be not completely correct:
>> 1. All futex-waiting tasks will wake up after that and they
>> may not expect that some tasks are on the previous stage,
>> so they will spam into logs with unrelated errors and may
>> also die painfully.
>> 2. Child may die and miss aborting of the futex due to:
>> o segfault
>> o OOM killer
>> o User-sended SIGKILL
>> o Other error-path we forgot to cover with abort futex
>>
>> To fix this deadlock in TASK_HELPER, introduce a new handler for
>> SIGCHLD which will check if the child was expected to die.
>> It should be as simple and plain as possible and only abort
>> futex there - we need reentrancy to avoid missing new signals.
>> No wait for helpers, nothing - only futex aborting.
>>
>> Signed-off-by: Dmitry Safonov <dsafonov at virtuozzo.com>
>> ---
>> criu/cr-restore.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
>> 1 file changed, 50 insertions(+), 5 deletions(-)
>>
>> diff --git a/criu/cr-restore.c b/criu/cr-restore.c
>> index f9d9b06db731..0da54ce75d41 100644
>> --- a/criu/cr-restore.c
>> +++ b/criu/cr-restore.c
>> @@ -1178,6 +1178,49 @@ out:
>> return ret;
>> }
>>
>> +static bool is_child_death_expected(pid_t dead_pid)
>> +{
>> + struct pstree_item *pi;
>> +
>> + list_for_each_entry(pi, ¤t->children, sibling) {
>> + if (dead_pid != vpid(pi))
>> + continue;
>
> See [PATCH] restore: Fix waited task pid in criu mailing list
It's sent ~30 minutes ago, but I'll have it in mind.
>
>> +
>> + switch (pi->pid->state) {
>> + case TASK_DEAD:
>> + case TASK_HELPER:
>> + return true;
>> + default:
>> + return false;
>> + }
>> + }
>> +
>> + return false;
>> +}
>> +
>> +/*
>> + * SIGCHLD isn't blocked in this handler.
>> + * Otherwise we may miss some pending SIGCHLD.
>> + * Rather make the handler re-entrant.
>> + */
>> +static void helper_sigchld_handler(int signal, siginfo_t *siginfo, void *data)
>> +{
>> + pid_t pid = siginfo->si_pid;
>> +
>> + if (is_child_death_expected(pid))
>
> It's wrong. Imagine, you just have installed a handler
> and a child exites right after that; but child has not
> called restore_finish_stage() yet. You'll have the same
> deadlock here.
Nope. It's perfectly valid.
Read the code carefully, please:
futex_abort_and_wake() sets nr_in_progress to FUTEX_ABORT_RAW == -1
The root task does restore_switch_stage() which sleeps with
__restore_wait_inprogress_tasks():
> futex_t *np = &task_entries->nr_in_progress;
>
> futex_wait_while_gt(np, participants);
> ret = (int)futex_get(np);
> if (ret < 0) {
> set_cr_errno(get_task_cr_err());
> return ret;
> }
It will switch the stage and abort restoring with an error.
While helper will go to the next stage and exit.
>
>> + return;
>> +
>> + /*
>> + * XXX: Not safe to print from sighandler.
>> + * Need something like pr_atomic().
>> + * Good for a while as in the worst case we'll have
>> + * corrupted messages.
>> + */
>> + pr_info("Error: Unexpected death of %d\n", pid);
>> +
>> + futex_abort_and_wake(&task_entries->nr_in_progress);
>> +}
>> +
>> static int restore_one_task(int pid, CoreEntry *core)
>> {
>> int i, ret;
>> @@ -1189,16 +1232,18 @@ static int restore_one_task(int pid, CoreEntry *core)
>> else if (current->pid->state == TASK_DEAD)
>> ret = restore_one_zombie(core);
>> else if (current->pid->state == TASK_HELPER) {
>> - sigset_t blockmask, oldmask;
>> + struct sigaction act;
>>
>> if (prepare_fds(current))
>> return -1;
>>
>> - sigemptyset(&blockmask);
>> - sigaddset(&blockmask, SIGCHLD);
>> + sigemptyset(&act.sa_mask); /* Don't block SIGCHLD here! */
>> + act.sa_flags = SA_NOCLDSTOP | SA_SIGINFO | SA_RESTART;
>> + act.sa_sigaction = helper_sigchld_handler;
>
> I'd suggested not to introduce new helper and to do the wait
> synchronous with blocked signals, like we already have. At
> this place you are already know, whether you have to wait
> a child, so simple analysing of waitpid(-1) return value
> will show you if the child dead or not. Something like this:
>
> if (need to wait) {
> block signals
> futex dec current stage
> pid = waitpid(-1)
> if (stage == CR_STATE_RESTORE) /* task exited before criu switched the stage */
> error();
> check_pid_is_expected(pid)
> } else {
> restore_finish_stage(task_entries, CR_STATE_RESTORE)
> /* generic sighandler will catch unexpected child */
> }
Sucks.
You complicate this part which may be a can full of bugs later.
>
> Asynchronous events are need for something unexpected,
> and they make the code more complicated. Also, if someone
> decides to rework generic sig handler and to use it
> after this lines, he will have to restore it later.
> In my opinion, it's better to have generic handlers
> like we have where it's possible, while this place
> is easily may be written without changing them.
Your proposal expects that helpers may die *only* on the
following stage, which may be changed much more likely.
And it's tough bond.
>
>>
>> - if (sigprocmask(SIG_BLOCK, &blockmask, &oldmask) == -1) {
>> - pr_perror("Can not set mask of blocked signals");
>> + ret = sigaction(SIGCHLD, &act, NULL);
>> + if (ret < 0) {
>> + pr_perror("sigaction() failed");
>> return -1;
>> }
>>
>> --
>> 2.13.1
>>
>> _______________________________________________
>> CRIU mailing list
>> CRIU at openvz.org
>> https://lists.openvz.org/mailman/listinfo/criu
--
Dmitry
More information about the CRIU
mailing list