[Devel] Re: [RFC v14][PATCH 53/54] Detect resource leaks forwhole-containercheckpoint

Matt Helsley matthltc at us.ibm.com
Fri May 8 01:12:25 PDT 2009


On Thu, May 07, 2009 at 09:56:22PM -0700, Sukadev Bhattiprolu wrote:
> Oren Laadan [orenl at cs.columbia.edu] wrote:
> | 
> | 
> | Sukadev Bhattiprolu wrote:
> | > Oren Laadan [orenl at cs.columbia.edu] wrote:
> | > | Add a 'users' count to objhash items, and, for a !CHECKPOINT_SUBTREE
> | > | checkpoint, return an error code if the actual objects' counts are
> | > | higher, indicating leaks (references to the objects from a task not
> | > | being checkpointed).  Of course, by this time most of the checkpoint
> | > | image has been written out to disk, so this is purely advisory.  But
> | > | then, it's probably naive to argue that anything more than an advisory
> | > | 'this went wrong' error code is useful.
> | > | 
> | > | The comparison of the objhash user counts to object refcounts as a
> | > | basis for checking for leaks comes from Alexey's OpenVZ-based c/r
> | > | patchset.
> | > | 
> | > | Signed-off-by: Oren Laadan <orenl at cs.columbia.edu>
> | > | ---
> | > |  checkpoint/checkpoint.c    |    8 +++
> | > |  checkpoint/memory.c        |    2 +
> | > |  checkpoint/objhash.c       |  108 +++++++++++++++++++++++++++++++++++++++----
> | > |  include/linux/checkpoint.h |    2 +
> | > |  4 files changed, 110 insertions(+), 10 deletions(-)
> | > | 
> | > | diff --git a/checkpoint/checkpoint.c b/checkpoint/checkpoint.c
> | > | index 4319976..32a0a8e 100644
> | > | --- a/checkpoint/checkpoint.c
> | > | +++ b/checkpoint/checkpoint.c
> | > | @@ -498,6 +498,14 @@ int do_checkpoint(struct ckpt_ctx *ctx, pid_t pid)
> | > |  	if (ret < 0)
> | > |  		goto out;
> | > | 
> | > | +	if (!(ctx->flags & CHECKPOINT_SUBTREE)) {
> | > | +		/* verify that all objects are contained (no leaks) */
> | > | +		if (!ckpt_obj_contained(ctx)) {
> | > | +			ret = -EBUSY;
> | > | +			goto out;
> | > | +		}
> | > | +	}
> | > | +
> | > |  	/* on success, return (unique) checkpoint identifier */
> | > |  	ctx->crid = atomic_inc_return(&ctx_count);
> | > |  	ret = ctx->crid;
> | > | diff --git a/checkpoint/memory.c b/checkpoint/memory.c
> | > | index 7637c1e..5ae2b41 100644
> | > | --- a/checkpoint/memory.c
> | > | +++ b/checkpoint/memory.c
> | > | @@ -687,6 +687,8 @@ static int do_checkpoint_mm(struct ckpt_ctx *ctx, struct mm_struct *mm)
> | > |  			ret = exe_objref;
> | > |  			goto out;
> | > |  		}
> | > | +		/* account for all references through vma/exe_file */
> | > | +		ckpt_obj_users_inc(ctx, mm->exe_file, mm->num_exe_file_vmas);
> | > 
> | > Do we really need to add num_exe_file_vmas here ?
> | > 
> | > A quick look at all callers for added_exe_file_vma() seems to show that
> | > those callers also do a get_file().
> | 
> | Each vma whose file is the same as mm->exe_file causes the refcount
> | of that file to increase by 2: once by vma->vm_file, and once via
> | added_exe_file_vma(). The c/r code calls ckpt_obj_checkpoint() only
> | once, thus once one obj_file_grab() for that file. The code above
> | accounts for the missing count.
> 
> If the executable is shared between a parent and child (as in fork()/dup_mm)
> do we still need to account for the 'added_exe_file_vma()' in the child
> process ?
> 
> i.e I can trace a call to added_exe_file_vma() when loading/mmaping a biniary.
> But I can't trace a call to added_exe_file_vma() during fork()/dup_mm()).

Yes, this is consistent with the common case. The count should usually
only change when exec'ing. That said, it increases as the original vmas
are split or merged as protection/permission bits change or sections are
unmapped altogether.
 
> Here is how I can account for the 16 in the obj->users :-)
> 
> 	Parent:
> 		do_checkpoint_mm: +2	= 2	(first time/obj_new())
> 		num_exe_vmas: +2	= 4

You mean mm->exe_file, not "num_exe_file_vmas"? num_exe_file_vmas
is irrelevant when it comes to checkpoint.

This seems odd: +4 to the obj->users count seems like 2 too many. It
should just be +2 -- once for mm->exe_file and once for the objhash
itself.

> 
> 		filemap_checkpoint: +1	= 5	(text section)
> 		filemap_checkpoint: +1	= 6	(data section)

These filemap_checkpoint counts make perfect sense.

> 
> 	Child:
> 		do_checkpoint_mm: +1	= 7

OK (for the child's mm->exe_file ref)

> 		num_exe_file_vmas: +2	= 9

Bad

> 
> 		filemap_checkpoint: +1	= 10	(text section)
> 		filemap_checkpoint: +1	= 11	(data section)
> 
> 	Grand child:
> 
> 		do_checkpoint_mm: +1	= 12

OK (for the child's mm->exe_file ref)

> 		num_exe_file_vmas: +2	= 14

Bad

> 
> 		filemap_checkpoint: +1	= 15	(text section)
> 		filemap_checkpoint: +1	= 16	(data section)
> 
> Even if we were to drop the num_exe_file_vmas for the child and
> grand-child, we would be off by 2 :-(

Drop all the "num_exe_vmas" increments.

> As of now, I can account for 9 of the 10 found in file->f_count.
> 
> 
> 	Parent:
> 		load_a.out/do_mmap: +2	= 2	(text)
> 		load_aout/do_mmap(): +2	= 4	(data)

(aside: I don't know why these add two each... are you sure there
weren't two more refs by dup_mm_exe_file() in child and grandchild
below?)

These happen during exec. Are you missing this one:

	void set_mm_exe_file(struct mm_struct *mm, struct file *new_exe_file)
	{
		if (new_exe_file)
			get_file(new_exe_file);
	...

	int flush_old_exec(struct linux_binprm * bprm)
	{
		char * name;
		int i, ch, retval;
		char tcomm[sizeof(current->comm)];

		/*
		 * Make sure we have a private signal table and that
		 * we are unassociated from the previous thread group.
		 */
		retval = de_thread(current);
		if (retval)
			goto out;

		set_mm_exe_file(bprm->mm, bprm->file); <----- *here*

		/*
		 * Release all of the old mmap stuff
		 */
		retval = exec_mmap(bprm->mm);
		if (retval)
			goto out;
	...

> 
> 	Child:
> 		dup_mm()/dup_mmap(): +1	= 5	(text)
> 		dup_mm()/dup_mmap(): +1	= 6	(data)

Odd. Should be += 3 for each fork-induced dup_mm():
	1 for text VMA
	1 for data VMA
	1 for mm->exe_file (see dup_mm_exe_file() in fs/proc/base.c)

> 
> 	Grand Child:
> 		dup_mm()/dup_mmap(): +1	= 7	(text)
> 		dup_mm()/dup_mmap(): +1	= 8	(data)
>
> 	Checkpoint/Objhash:
> 
> 		obj_new/obj_file_grab: +1 = 9

By dropping the "num_exe_vmas" additions to obj->users and assuming
I'm right about file->f_count being referenced three times for each
fork, I think that accounts for everything -- file->f_count == obj->users.

> Another question is regarding the obj->users = 2 in obj_new():
> 
> 	- one of this reference is for the get_file() in obj_file_grab()
> 	  called near the end of obj_new() right ?
> 
> 	- where can I find the other get_file() ?

I think in flush_old_exec() above.

Cheers,
	-Matt
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list