[Devel] Re: pidns memory leak

Eric W. Biederman ebiederm at xmission.com
Fri Oct 9 19:08:44 PDT 2009


Sukadev Bhattiprolu <sukadev at linux.vnet.ibm.com> writes:

> Eric W. Biederman [ebiederm at xmission.com] wrote:
> | Sukadev Bhattiprolu <sukadev at linux.vnet.ibm.com> writes:
> | 
> | > Andrea,
> | >
> | > We have been running a leak in child pid namespaces and some early debugging
> | > points to the following commit:
> | >
> | >>> 	commit 7766755a2f249e7e0dabc5255a0a3d151ff79821
> | >>> 	Author: Andrea Arcangeli <andrea at suse.de>
> | >>> 	Date:   Mon Feb 4 22:29:21 2008 -0800
> | >>>
> | >
> | > Reverting the commit seems to fix the leak but we need to do some more
> | > analysis (like the lstat() question Daniel has).
> | 
> | Yes.
> | 
> | That entire path is an optimization.  It should not be needed for correct
> | operation.  Although it may be responsible for some false positives.
> | 
> | > However I have a basic question regarding the commit - the log mentions:
> | >
> | > 	> do_exit->release_task->mark_inode_dirty_sync->schedule() (will never
> | > 	> come back to run journal_stop)
> | >
> | > But release_task() calls shrink_dcache_parent() for a _procfs_ dentry. Does
> | > journal_stop() apply to procfs also ?
> | 
> | The problem when the that PF_EXITING check was introduced is that
> | shrink_dcache_parent could shrink dcache entries for other
> | filesystems.  Last I looked that is no longer the case and we can
> | remove that code.
>
> Ok.
>
> | As I recall proc_flush_task_mnt has a few other minor bugs as well that
> | could cause problems.
>
> Can you give me some more details on those bugs ? Reverting the commit
> seems to fix the problem.
>
> | 
> | Ultimately what problems are you seeing?
>
> We are leaking 'struct pid', proc_inode, and 'struct pid_namespace', when
> container-init exits before its descendant processes. i.e when the
> container-init zaps its descendants and waits for them, it calls the
> proc_flush_task_mnt(), but then misses the shrink_dcache_parent() call due
> to the above commit.
>
> So the proc_inode is never deleted and the references to struct pid and
> pid_namespace never go away. Details of the leak are buried in the
> previous mail...

In should be the case that bloating up the dcache so that we get a general
shrink_dcache from the memory reclaim code will free the proc_inode and
the appropriate data structures.  struct pid is supposed to be small and
safe to leak in rare circumstances.

It should be possible to trigger this condition by creating a pid namespace.
cd /proc/<pid>/  (where <pid> is some process in that pid namespace)

Terminating that pid namespace.

But you are still actively using the proc_inode and the struct pid for the
process that has been killed.  Because a process has it as it's current
working directory.

Eric

_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list