[Devel] Re: [RFC v14][PATCH 53/54] Detect resource leaks for whole-container checkpoint

Oren Laadan orenl at cs.columbia.edu
Wed May 6 21:11:41 PDT 2009



Sukadev Bhattiprolu wrote:
> Oren Laadan [orenl at cs.columbia.edu] wrote:
> | Add a 'users' count to objhash items, and, for a !CHECKPOINT_SUBTREE
> | checkpoint, return an error code if the actual objects' counts are
> | higher, indicating leaks (references to the objects from a task not
> | being checkpointed).  Of course, by this time most of the checkpoint
> | image has been written out to disk, so this is purely advisory.  But
> | then, it's probably naive to argue that anything more than an advisory
> | 'this went wrong' error code is useful.
> | 
> | The comparison of the objhash user counts to object refcounts as a
> | basis for checking for leaks comes from Alexey's OpenVZ-based c/r
> | patchset.
> | 
> | Signed-off-by: Oren Laadan <orenl at cs.columbia.edu>
> | ---
> |  checkpoint/checkpoint.c    |    8 +++
> |  checkpoint/memory.c        |    2 +
> |  checkpoint/objhash.c       |  108 +++++++++++++++++++++++++++++++++++++++----
> |  include/linux/checkpoint.h |    2 +
> |  4 files changed, 110 insertions(+), 10 deletions(-)
> | 
> | diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
> | index 4319976..32a0a8e 100644
> | --- a/checkpoint/checkpoint.c
> | +++ b/checkpoint/checkpoint.c
> | @@ -498,6 +498,14 @@ int do_checkpoint(struct ckpt_ctx *ctx, pid_t pid)
> |  	if (ret < 0)
> |  		goto out;
> | 
> | +	if (!(ctx->flags & CHECKPOINT_SUBTREE)) {
> | +		/* verify that all objects are contained (no leaks) */
> | +		if (!ckpt_obj_contained(ctx)) {
> | +			ret = -EBUSY;
> | +			goto out;
> | +		}
> | +	}
> | +
> |  	/* on success, return (unique) checkpoint identifier */
> |  	ctx->crid = atomic_inc_return(&ctx_count);
> |  	ret = ctx->crid;
> | diff --git a/checkpoint/memory.c b/checkpoint/memory.c
> | index 7637c1e..5ae2b41 100644
> | --- a/checkpoint/memory.c
> | +++ b/checkpoint/memory.c
> | @@ -687,6 +687,8 @@ static int do_checkpoint_mm(struct ckpt_ctx *ctx, struct mm_struct *mm)
> |  			ret = exe_objref;
> |  			goto out;
> |  		}
> | +		/* account for all references through vma/exe_file */
> | +		ckpt_obj_users_inc(ctx, mm->exe_file, mm->num_exe_file_vmas);
> 
> Do we really need to add num_exe_file_vmas here ?
> 
> A quick look at all callers for added_exe_file_vma() seems to show that
> those callers also do a get_file().

Each vma whose file is the same as mm->exe_file causes the refcount
of that file to increase by 2: once by vma->vm_file, and once via
added_exe_file_vma(). The c/r code calls ckpt_obj_checkpoint() only
once, thus once one obj_file_grab() for that file. The code above
accounts for the missing count.

> 
> Anyway, when I try to C/R a simple process tree, I get -EBUSY because
> ckpt_obj_contained() finds a ref-count mismatch for the executable file.

Are you certain the culprit is this count and not, possibly, some
other "leak" ?  If so, do you know what's the count difference ?
Posting the test program and a description of what you did would
be useful...

> 
> I suspect the above increment of 'num_exe_file_vmas' is causing 
> 'obj->users' to exceed the value returned by the file's ->ref_users().
> 
> Or, when incrementing by 'num_exe_file_vmas', should we also call
> obj_file_grab() ('num_exe_file_vmas' times) to keep the ref counts
> in sync ?
> 

The objhash only keeps a single refocunt of the objects that it holds
(accounted for in the comparison already). It also keeps users count
that accounts for the number of times it was observed while doing the
checkpoint. If there are no leaks, then "users + 1 = real-ref-count".

Oren.


_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list