[Devel] Re: [PATCH 2/3] restart debug: add final process tree status

Oren Laadan orenl at librato.com
Thu Oct 1 16:29:49 PDT 2009



Serge E. Hallyn wrote:
> 
> Here:
> 
> From 8cf006a1bf26a4b280841401302c99689d629e0a Mon Sep 17 00:00:00 2001
> From: Serge E. Hallyn <serue at us.ibm.com>
> Date: Thu, 1 Oct 2009 11:09:40 -0400
> Subject: [PATCH 1/1] restart debug: add final process tree status (v2)
> 
> Have tasks in sys_restart keep some status in a list off
> of checkpoint_ctx, and print this info when the checkpoint_ctx
> is freed.
> 
> This version is mainly just ported against ckpt-v18-hallyn.
> 
> Sample output:
> 
> [3519:2:c/r:free_per_task_status:207] 3 tasks registered, nr_tasks was 0 nr_total 0
> [3519:2:c/r:free_per_task_status:210] active pid was 1, ctx->errno 0
> [3519:2:c/r:free_per_task_status:212] kflags 6 uflags 0 oflags 1
> [3519:2:c/r:free_per_task_status:214] task 0 to run was 2
> [3519:2:c/r:free_per_task_status:217] pid 3517
> [3519:2:c/r:free_per_task_status:219] it was coordinator
> [3519:2:c/r:free_per_task_status:227] it was running
> [3519:2:c/r:free_per_task_status:217] pid 3519
> [3519:2:c/r:free_per_task_status:223] it was the root task
> [3519:2:c/r:free_per_task_status:229] it was a normal task
> [3519:2:c/r:free_per_task_status:217] pid 3520
> [3519:2:c/r:free_per_task_status:221] it was a ghost
> 
> Signed-off-by: Serge E. Hallyn <serue at us.ibm.com>

Looks good.. I'll massage it a bit and add. Meanwhile, a
couple of questions:

[...]

> ---
>  checkpoint/restart.c             |  106 ++++++++++++++++++++++++++++++++++++++
>  checkpoint/sys.c                 |   57 ++++++++++++++++++++
>  include/linux/checkpoint_types.h |   20 +++++++
>  3 files changed, 183 insertions(+), 0 deletions(-)
> 
> diff --git a/checkpoint/restart.c b/checkpoint/restart.c
> index b12c8bd..1f356c0 100644
> --- a/checkpoint/restart.c
> +++ b/checkpoint/restart.c
> @@ -26,6 +26,98 @@
>  #include <linux/checkpoint.h>
>  #include <linux/checkpoint_hdr.h>
>  
> +#ifdef CONFIG_CHECKPOINT_DEBUG
> +static struct ckpt_task_status *ckpt_debug_checkin(struct ckpt_ctx *ctx)
> +{
> +	struct ckpt_task_status *s;
> +	s = kmalloc(sizeof(*s), GFP_KERNEL);
> +	if (!s)
> +		return NULL;
> +	s->pid = current->pid;
> +	s->error = 0;
> +	s->flags = RESTART_DBG_WAITING;
> +	if (current == ctx->root_task)
> +		s->flags |= RESTART_DBG_ROOT;
> +	list_add_tail(&s->list, &ctx->per_task_status);
> +	return s;
> +}

The logic would be a bit simpler if you allow check-in to fail
(and then fail the restart) - you then don't need to test for
validity of @s everywhere.

> +
> +static struct ckpt_task_status *getme(struct ckpt_ctx *ctx)
> +{
> +	struct ckpt_task_status *s = NULL;
> +	list_for_each_entry(s, &ctx->per_task_status, list) {
> +		if (s->pid == current->pid)
> +			break;
> +	}
> +	if (!s || s->pid != current->pid)
> +		return NULL;

Note that here @s is never NULL.

[...]

> @@ -680,11 +772,17 @@ static int do_ghost_task(void)
>  	if (IS_ERR(ctx))
>  		return PTR_ERR(ctx);
>  
> +	ckpt_debug_ghost(ctx);
> +
> +	ckpt_debug_log_running(ctx);
> +
>  	current->flags |= PF_RESTARTING;
>  
>  	ret = wait_event_interruptible(ctx->ghostq,
>  				       all_tasks_activated(ctx) ||
>  				       ckpt_test_ctx_error(ctx));
> +
> +	ckpt_debug_log_error(ctx, 0);

Did you mean s/0/ret/ ?

[...]

> +	list_for_each_entry_safe(s, p, &ctx->per_task_status, list) {
> +		ckpt_debug("pid %d\n", s->pid);
> +		if (s->flags & RESTART_DBG_COORD)
> +			ckpt_debug("it was coordinator\n");
> +		if (s->flags & RESTART_DBG_GHOST)
> +			ckpt_debug("it was a ghost\n");
> +		if (s->flags & RESTART_DBG_ROOT)
> +			ckpt_debug("it was the root task\n");
> +		if (s->flags & RESTART_DBG_WAITING)
> +			ckpt_debug("it was still waiting to run restart\n");
> +		if (s->flags & RESTART_DBG_RUNNING)
> +			ckpt_debug("it was running\n");
> +		if (s->flags & RESTART_DBG_NORMAL)
> +			ckpt_debug("it was a normal task\n");
> +		if (s->flags & RESTART_DBG_FAILED)
> +			ckpt_debug("it finished with error %d\n", s->error);
> +		if (s->flags & RESTART_DBG_FAILED)

s/FAILED/SUCCESS/ ... :p

[...]

Oren.

_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list