[Devel] [RFC][PATCH][cr]: Mark ghost tasks as detached earlier

Sukadev Bhattiprolu sukadev at linux.vnet.ibm.com
Sat Oct 30 00:01:51 PDT 2010


>From ce9dd2fc7332597d46872f3f8c52ac0806f381d1 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <sukadev at linux.vnet.ibm.com>
Date: Fri, 29 Oct 2010 23:16:10 -0700
Subject: [PATCH 1/1] Mark ghost task as detached earlier

During restart() of an application, ghost tasks are be marked as "detached"
so they don't send a SIGCHLD to their parent when they exit. But this is
currently being done a little too late in the "life" of the ghost and
ends up confusing the container-init.

Suppose a ghost child of the container-init is waiting in do_ghost_task().
It is not yet detached. If the container-init is terminated for some
reason, the container-init sends SIGKILL to its children (including this
ghost). The container-init then waits for the un-detached children to
exit, expecting to be notified via SIGCHLD.

When the ghost-child receives the SIGKILL, it wakes up and marks itself
detached and proceeds to exit. Since it is now detached, it will not
notify the parent, thus leaving the container-init blocked indefintely.

Some background:

When running some tests on the C/R code we ran into the problem of the
container-init not waiting for detached processes. This problem was
extensively discssued here:

	http://lkml.org/lkml/2010/6/16/295

Eric Biederman had a fix for the problem:

	http://lkml.org/lkml/2010/7/12/213

When I applied this fix to the C/R tree and repeated the tests, I ran
into the above issue of the container-init hanging. Marking the ghost
as detached earlier seems to fix the confusion in the container-init.

Oren, is there a reason not to mark the ghost task detached earlier
than is currently being done ?

Signed-off-by: Sukadev Bhattiprolu (sukadev at us.ibm.com)
---
 kernel/checkpoint/restart.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/checkpoint/restart.c b/kernel/checkpoint/restart.c
index 17270b8..95789c0 100644
--- a/kernel/checkpoint/restart.c
+++ b/kernel/checkpoint/restart.c
@@ -953,6 +953,7 @@ static int do_ghost_task(void)
 	struct ckpt_ctx *ctx;
 	int ret;
 
+	current->exit_signal = -1;
 	ctx = wait_checkpoint_ctx();
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);
@@ -972,7 +973,6 @@ static int do_ghost_task(void)
 	if (ret < 0)
 		ckpt_err(ctx, ret, "ghost restart failed\n");
 
-	current->exit_signal = -1;
 	restore_debug_exit(ctx);
 	ckpt_ctx_put(ctx);
 	do_exit(0);
-- 
1.6.6.1

_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list