[CRIU] [PATCH 4/4] restore: Fix hang if root task is waiting on zombie
Andrey Vagin
avagin at gmail.com
Tue Dec 11 09:51:12 MSK 2018
On Fri, Dec 07, 2018 at 02:57:12PM +0300, Cyrill Gorcunov wrote:
> When we're waiting for zombie we might be the last task
> and the others already altered nr_in_progress futex. Thus
> in this situation we should rather use nanosleep or similar
> to relieve syscall pressure. Otherwise we can simply stuck
> at restore procedure waiting for @nr_in_progress forever.
I don't understand this description. Pls provide a sequence of actions
when we can stuck at restore.
Thanks,
Andrei
>
> Signed-off-by: Cyrill Gorcunov <gorcunov at gmail.com>
> ---
> criu/pie/restorer.c | 36 +++++++++++++++++++++++++++++++-----
> 1 file changed, 31 insertions(+), 5 deletions(-)
>
> diff --git a/criu/pie/restorer.c b/criu/pie/restorer.c
> index d3b459c6c8b1..b4c0eecd44cc 100644
> --- a/criu/pie/restorer.c
> +++ b/criu/pie/restorer.c
> @@ -1162,6 +1162,13 @@ static int wait_helpers(struct task_restore_args *task_args)
>
> static int wait_zombies(struct task_restore_args *task_args)
> {
> + static const uint32_t step_ms = 100;
> + static const struct timespec req = {
> + .tv_nsec = step_ms * 1000000,
> + .tv_sec = 0,
> + };
> + struct timespec rem;
> + uint32_t nr_prints = 10;
> int i;
>
> for (i = 0; i < task_args->zombies_n; i++) {
> @@ -1171,12 +1178,31 @@ static int wait_zombies(struct task_restore_args *task_args)
>
> ret = sys_waitid(P_PID, task_args->zombies[i], NULL, WNOWAIT | WEXITED, NULL);
> if (ret == -ECHILD) {
> - /* A process isn't reparented to this task yet.
> - * Let's wait when someone complete this stage
> - * and try again.
> + /*
> + * A zombie is not reparented to us yet. So we need to
> + * wait for it. If there some tasks left then we can
> + * use @nr_in_progress to not calling waitid too often.
> + * But in case if we're the root process, the @nr_in_progress
> + * won't be altered and we use nanosleep with @step_ms
> + * to relieve syscall pressure.
> */
> - futex_wait_while_eq(&task_entries_local->nr_in_progress,
> - nr_in_progress);
> + if (nr_in_progress > 1) {
> + futex_wait_while_eq(&task_entries_local->nr_in_progress,
> + nr_in_progress);
> + } else {
> + if (nr_prints)
> + pr_debug("wait_zombies %d (nanosleep %ld %ld)\n",
> + task_args->zombies[i], req.tv_sec, req.tv_nsec);
> + ret = sys_nanosleep((struct timespec *)&req, &rem);
> + if (ret == -EINTR) {
> + if (nr_prints)
> + pr_debug("\twait_zombies %d (nanosleep %ld %ld)\n",
> + task_args->zombies[i], rem.tv_sec, rem.tv_nsec);
> + sys_nanosleep((struct timespec *)&rem, NULL);
> + }
> + if (nr_prints)
> + nr_prints--;
> + }
> i--;
> continue;
> }
> --
> 2.17.2
>
More information about the CRIU
mailing list