[Devel] Re: [PATCH] cgroups: defer free css_set

Paul Menage menage at google.com
Fri Nov 21 10:28:44 PST 2008


On Fri, Nov 21, 2008 at 12:49 AM, Lai Jiangshan <laijs at cn.fujitsu.com> wrote:
>
> we free css_set when refcnt became 0 immediately(except cgroup_attach_task()).
> I will destroy the data which read side maybe still access it.
> this patch use call_rcu() to defer free css_set
>
> Signed-off-by: Lai Jiangshan <laijs at cn.fujitsu.com>
> ---
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 1164963..22901ff 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -178,6 +178,8 @@ struct css_set {
>         */
>        struct list_head cg_links;
>
> +       struct rcu_head rcu;
> +
>        /*
>         * Set of subsystem states, one for each subsystem. This array
>         * is immutable after creation apart from the init_css_set
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 358e775..ddc10ac 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -252,6 +252,11 @@ static void unlink_css_set(struct css_set *cg)
>        }
>  }
>
> +static void rcu_free_css_set(struct rcu_head *head)
> +{
> +       kfree(container_of(head, struct css_set, rcu));
> +}
> +
>  static void __put_css_set(struct css_set *cg, int taskexit)
>  {
>        int i;
> @@ -281,7 +286,7 @@ static void __put_css_set(struct css_set *cg, int taskexit)
>                }
>        }
>        rcu_read_unlock();
> -       kfree(cg);
> +       call_rcu(&cg->rcu, rcu_free_css_set);
>  }
>
>  /*
> @@ -1267,7 +1277,6 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
>                        ss->attach(ss, cgrp, oldcgrp, tsk);
>        }
>        set_bit(CGRP_RELEASABLE, &oldcgrp->flags);
> -       synchronize_rcu();

I'm reluctant to remove this synchronize_rcu() call - it gives the
property that if you get a pointer to a task's cgroup protected by
RCU, then even if you race with the task moving away to a different
cgroup, then no other cgroup_mutex-protected operation can start until
you've finished your RCU section (since the thread that you raced with
is blocking in synchronize_rcu() while holding cgroup_mutex). I'm
pretty sure that some of the cgroups code relies on that property,
although I can't find exactly which bit I'm thinking of.

Also, using call_rcu() for freeing all css_sets seems unnecessary -
the only one that appears to be potentially broken is the one from
cgroup_exit(), since in the other cases the css_set hasn't been
visible via a task->cgroups pointer. So how about making
__put_css_set() do a call_rcu() for the case when taskexit is true,
and a plain free() otherwise? That would also reduce the change of
overloading the RCU system with too many deferred frees.

Paul
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list