[CRIU] [PATCHv3 2/3] restore: Fix deadlock when helper's child dies
Dmitry Safonov
dsafonov at virtuozzo.com
Thu Jul 20 14:21:26 MSK 2017
On 07/20/2017 10:39 AM, Andrei Vagin wrote:
> On Wed, Jul 19, 2017 at 05:02:53PM +0300, Dmitry Safonov wrote:
>> Since commit ced9c529f687 ("restore: fix race with helpers' kids dying
>> too early"), we block SIGCHLD in helper tasks before CR_STATE_RESTORE.
>> This way we avoided default criu sighandler as it doesn't expect that
>> childs may die.
>>
>> This is very racy as we wait on futex for another stage to be started,
>> but the next stage may start only when all the tasks complete previous
>> stage. If some children of helper dies, the helper may already have
>> blocked SIGCHLD and have started sleeping on the futex. Then the next
>> stage never comes and no one reads a pending SIGCHLD for helper.
>>
>> A customer met this situation on the node, where the following
>> (non-related) problem has occured:
>> Unable to send a fin packet: libnet_write_raw_ipv6(): -1 bytes written (Network is unreachable)
>> Then child criu of the helper has exited with error-code and the
>> lockup has happened.
>>
>> While we could fix it by aborting futex in the end of
>> restore_task_with_children() for each (non-root also) tasks,
>> that would be not completely correct:
>> 1. All futex-waiting tasks will wake up after that and they
>> may not expect that some tasks are on the previous stage,
>> so they will spam into logs with unrelated errors and may
>> also die painfully.
>> 2. Child may die and miss aborting of the futex due to:
>> o segfault
>> o OOM killer
>> o User-sended SIGKILL
>> o Other error-path we forgot to cover with abort futex
>>
>> To fix this deadlock in TASK_HELPER, as suggested-by Kirill,
>> let's check if there are children deaths expected - if there
>> isn't any, don't block SIGCHLD, otherwise wait() and check if
>> death was on expected stage of restore (not CR_STATE_RESTORE).
>>
>> Signed-off-by: Dmitry Safonov <dsafonov at virtuozzo.com>
>> ---
>> criu/cr-restore.c | 75 +++++++++++++++++++++++++++++++++++++++----------------
>> 1 file changed, 53 insertions(+), 22 deletions(-)
>>
>> diff --git a/criu/cr-restore.c b/criu/cr-restore.c
>> index 9143cd310fed..21c5fbf3eacc 100644
>> --- a/criu/cr-restore.c
>> +++ b/criu/cr-restore.c
>> @@ -1207,6 +1207,58 @@ out:
>> return ret;
>> }
>>
>> +static bool child_death_expected(void)
>> +{
>> + struct pstree_item *pi;
>> +
>> + list_for_each_entry(pi, ¤t->children, sibling) {
>> + switch (pi->pid->state) {
>> + case TASK_DEAD:
>> + case TASK_HELPER:
>> + return true;
>> + }
>> + }
>> +
>> + return false;
>> +}
>> +
>
> Could you write a comment for this function?
Ok
>
>> +static int restore_one_helper(void)
>> +{
>> + sigset_t oldmask;
>> + siginfo_t info;
>> +
>> + if (prepare_fds(current))
>> + return -1;
>> +
>> + if (!child_death_expected()) {
>> + restore_finish_stage(task_entries, CR_STATE_RESTORE);
>> + return 0;
>> + }
>> +
>> + if (block_sigmask(&oldmask, SIGCHLD))
>
> block_sigmask(NULL, SIGCHLD) has to work too, oldmask isn't used
Yep, it was over-caution, as I was not sure if on all archs the
parameter of sigprocmask can be NULL.
But man says it can be, so will do.
>> + return -1;
>> +
>> + /* Finish CR_STATE_RESTORE, but do not wait for the next stage. */
>> + futex_dec_and_wake(&task_entries->nr_in_progress);
>> +
>> + if (waitid(P_ALL, 0, &info, WEXITED | WNOWAIT)) {
>> + pr_perror("Failed to wait\n");
>> + return -1;
>> + }
>> +
>> + if (futex_get(&task_entries->start) == CR_STATE_RESTORE) {
>> + pr_err("Child %d died too early\n", info.si_pid);
>> + return -1;
>> + }
>> +
>> + if (wait_on_helpers_zombies()) {
>> + pr_err("Failed to wait on helpers and zombies\n");
>> + return -1;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> static int restore_one_task(int pid, CoreEntry *core)
>> {
>> int i, ret;
>> @@ -1218,32 +1270,11 @@ static int restore_one_task(int pid, CoreEntry *core)
>> else if (current->pid->state == TASK_DEAD)
>> ret = restore_one_zombie(core);
>> else if (current->pid->state == TASK_HELPER) {
>> - sigset_t blockmask, oldmask;
>> -
>> - if (prepare_fds(current))
>> - return -1;
>> -
>> - sigemptyset(&blockmask);
>> - sigaddset(&blockmask, SIGCHLD);
>> -
>> - if (sigprocmask(SIG_BLOCK, &blockmask, &oldmask) == -1) {
>> - pr_perror("Can not set mask of blocked signals");
>> - return -1;
>> - }
>> -
>> - restore_finish_stage(task_entries, CR_STATE_RESTORE);
>> -
>> + ret = restore_one_helper();
>> close_image_dir();
>> close_proc();
>> for (i = SERVICE_FD_MIN + 1; i < SERVICE_FD_MAX; i++)
>> close_service_fd(i);
>> -
>> - if (wait_on_helpers_zombies()) {
>> - pr_err("failed to wait on helpers and zombies\n");
>> - ret = -1;
>> - } else {
>> - ret = 0;
>> - }
>> } else {
>> pr_err("Unknown state in code %d\n", (int)core->tc->task_state);
>> ret = -1;
>> --
>> 2.13.1
>>
>> _______________________________________________
>> CRIU mailing list
>> CRIU at openvz.org
>> https://lists.openvz.org/mailman/listinfo/criu
--
Dmitry
More information about the CRIU
mailing list