[CRIU] Re: [PATCH] pidns: remove recursion from free_pid_ns (v3)

Xiaotian Feng xtfeng at gmail.com
Wed Oct 10 05:12:21 EDT 2012


On Wed, Oct 10, 2012 at 3:49 PM, Greg KH <greg at kroah.com> wrote:
> On Tue, Oct 09, 2012 at 12:08:31PM -0700, Andrew Morton wrote:
>> On Tue, 9 Oct 2012 12:03:00 -0700
>> Greg KH <greg at kroah.com> wrote:
>>
>> > On Tue, Oct 09, 2012 at 11:48:21AM -0700, Andrew Morton wrote:
>> > > On Sat,  6 Oct 2012 23:56:33 +0400
>> > > Andrew Vagin <avagin at openvz.org> wrote:
>> > >
>> > > > Here is a stack trace of recursion:
>> > > > free_pid_ns(parent)
>> > > >   put_pid_ns(parent)
>> > > >     kref_put(&ns->kref, free_pid_ns);
>> > > >       free_pid_ns
>> > > >
>> > > > This patch turns recursion into loops.
>> > > >
>> > > > pidns can be nested many times, so in case of recursion
>> > > > a simple user space program can provoke a kernel panic
>> > > > due to exceed of a kernel stack.
>> > >
>> > > So we should backport this into earlier kernels.
>> > >
>> > > > --- a/include/linux/kref.h
>> > > > +++ b/include/linux/kref.h
>> > > > @@ -95,6 +95,18 @@ static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref)
>> > > >         return kref_sub(kref, 1, release);
>> > > >  }
>> > > >
>> > > > +/**
>> > > > + * kref_put - decrement refcount for object.
>> > > > + * @kref: object.
>> > > > + *
>> > > > + * Decrement the refcount.
>> > > > + * Return 1 if refcount is zero.
>> > > > + */
>> > > > +static inline int __kref_put(struct kref *kref)
>> > > > +{
>> > > > +       return atomic_dec_and_test(&kref->refcount);
>> > > > +}
>> > >
>> > > Greg might be interested in this.
>> > >
>> > > It's a pretty specialised thing and perhaps it needs some stern words
>> > > in the description explaining when and why it should and shouldn't be
>> > > used.
>> > >
>> > > I wonder if people might (ab)use this to avoid the "doesn't
>> > > have a release function" warning.
>> >
>> > Yes they would, please don't do this at all.
>> >
>> > In fact, why is it needed?  It doesn't solve anything (if it does,
>> > something in the way the kref is being used is wrong.)
>> >
>>
>> It's right there in the changelog.  The patch fixes deep
>> kref_put->release->kref_put recursion by turning the operation for
>> pidns into a loop.
>
> But why would a kref release function ever decrement the same kref
> again causing a loop in the first place?
>

This should be kref_put ns->parent recursionly,  put_pid_ns(ns) ->
free_pid_ns(ns) -> put_pid_ns(ns->parent)->.......
If users create many nested namespaces, the recursion put_pid_ns()
will exausted the kernel stack.


> That's what I was referring to.  This strongly sounds like a problem in
> how the kref is being used, not in the kref code itself.
>
> Is a kref even the correct thing here?

Can we fix this by this way? free_pid_ns just release ns itself, we check
the return value of kref_put, if kref_put returns 1, means ns->kref is removed,
then we kref_put(ns->parent).

diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 00474b0..2168535 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -53,8 +53,14 @@ extern int reboot_pid_ns(struct pid_namespace
*pid_ns, int cmd);

 static inline void put_pid_ns(struct pid_namespace *ns)
 {
-	if (ns != &init_pid_ns)
-		kref_put(&ns->kref, free_pid_ns);
+	int ret;
+      struct pid_namespace *parent
+
+	while (ns != &init_pid_ns) {
+              parent = ns->parent;
+		ret = kref_put(&ns->kref, free_pid_ns);
+		if (!ret)
+			break;
+		else ns = parent;
+	}
 }

 #else /* !CONFIG_PID_NS */
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 478bad2..85ca23a 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -135,15 +135,11 @@ struct pid_namespace *copy_pid_ns(unsigned long
flags, struct pid_namespace *old

 void free_pid_ns(struct kref *kref)
 {
-	struct pid_namespace *ns, *parent;
+	struct pid_namespace *ns;

 	ns = container_of(kref, struct pid_namespace, kref);

-	parent = ns->parent;
 	destroy_pid_namespace(ns);
-
-	if (parent != NULL)
-		put_pid_ns(parent);
 }
 EXPORT_SYMBOL_GPL(free_pid_ns);


>
> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


More information about the CRIU mailing list