[Devel] [PATCH vz7] mm/slab: use monotonic serial number for naming per memcg caches
Kirill Tkhai
ktkhai at virtuozzo.com
Mon Oct 15 12:18:36 MSK 2018
On 15.10.2018 12:12, Konstantin Khorenko wrote:
> On 10/15/2018 11:58 AM, Kirill Tkhai wrote:
>> On 12.10.2018 18:54, Konstantin Khorenko wrote:
>>> Currently, we use mem_cgroup->kmemcg_id to guarantee kmem_cache->name
>>> uniqueness. Unfortunately kmemcg_id release happens on cgroup offline
>>> when kmem caches are still alive and their sysfs entries are not removed.
>>>
>>> Thus it's possible to start/suspend/resume/stop a Container and get into
>>> situation when mem cgroup is offline (thus mem_cgroup::kmemcg_id is
>>> released), but not freed and corresponding slab caches are alive along
>>> with their sysfs entries like:
>>> /sys/kernel/slab/filp/cgroup/filp($KMEMCG_ID:$MEMCG_NAME)
>>>
>>> If we start a Container after that, new memory cgroup is created and
>>> gets same kmemcg_id, new kmem_cache is created and sysfs_slab_add()
>>> fails due to duplicate name provided which triggers a warning:
>>>
>>> WARNING: CPU: 0 PID: 31346 at lib/kobject.c:239
>>> kobject_add_internal+0x6ef/0x910
>>> kobject_add_internal failed for kmalloc-512(3:100) with -EEXIST, don't
>>> try to register things with the same name in the same directory.
>>>
>>> In older kernels we use css_id(mem_cgroup_css(memcg)), but now cssid is
>>> dropped.
>>> In mainstream monotonic counter css->serial_nr is used instead,
>>> so let's add monotonic serial_nr as well, but as nobody needs it at the
>>> moment except memcg, let it be part of struct mem_cgroup.
>>
>> Looks good. Please, look at small nits below.
>>
>>> https://jira.sw.ru/browse/PSBM-89047
>>>
>>> Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
>>> ---
>>> include/linux/memcontrol.h | 11 +++++++++++
>>> mm/memcontrol.c | 18 ++++++++++++++++++
>>> mm/slab_common.c | 12 ++++++++++--
>>> 3 files changed, 39 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>>> index 640a5802e398..56eb202ac654 100644
>>> --- a/include/linux/memcontrol.h
>>> +++ b/include/linux/memcontrol.h
>>> @@ -549,6 +549,8 @@ bool __memcg_kmem_newpage_charge(struct page *page, gfp_t gfp, int order);
>>> void __memcg_kmem_uncharge_pages(struct page *page, int order);
>>>
>>> int memcg_cache_id(struct mem_cgroup *memcg);
>>> +void memcg_set_serial_nr(struct mem_cgroup *memcg, u64 val);
>>> +u64 memcg_get_serial_nr(struct mem_cgroup *memcg);
>>>
>>> struct kmem_cache *
>>> __memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
>>> @@ -685,6 +687,15 @@ static inline int memcg_cache_id(struct mem_cgroup *memcg)
>>> return -1;
>>> }
>>>
>>> +static inline void memcg_set_serial_nr(struct mem_cgroup *memcg, u64 val)
>>> +{
>>> +}
>>> +
>>> +static inline u64 memcg_get_serial_nr(struct mem_cgroup *memcg)
>>> +{
>>> + return 0;
>>> +}
>>> +
>>> static inline void memcg_get_cache_ids(void)
>>> {
>>> }
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index e57263fea566..2061d5b116bf 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -404,6 +404,14 @@ struct mem_cgroup {
>>> #if defined(CONFIG_MEMCG_KMEM)
>>> /* Index in the kmem_cache->memcg_params.memcg_caches array */
>>> int kmemcg_id;
>>> + /*
>>> + * Monotonically increasing unique serial number which defines a
>>> + * uniform order among all mem cgroups.
>>> + * It's used to construct unique kmem_cache name which should remain
>>> + * unique after mem cgroup offline when kmemcg_id is released.
>>> + */
>>> + u64 serial_nr;
>>> +
>>> /* List of memcgs sharing the same kmemcg_id */
>>> struct list_head kmemcg_sharers;
>>> #endif
>>> @@ -3432,6 +3440,16 @@ int memcg_cache_id(struct mem_cgroup *memcg)
>>> return memcg ? memcg->kmemcg_id : -1;
>>> }
>>>
>>> +void memcg_set_serial_nr(struct mem_cgroup *memcg, u64 val)
>>> +{
>>> + memcg->serial_nr = val;
>>> +}
>>> +
>>> +u64 memcg_get_serial_nr(struct mem_cgroup *memcg)
>>> +{
>>> + return memcg->serial_nr;
>>> +}
>>
>> Do we really need this struct mem_cgroup::serial_nr and the get/set primitives?
>> Are we going to use them for something else in the future?
>
> Not really.
> i've added those get/set primitives because otherwise gcc complained about incomplete type "struct mem_cgroup",
> at the moment mem_cgroup is defined in mm/memcontrol.c and i didn't want to move it to header,
> so get/set wrappers provide a smaller patch.
I mean, we can avoid both the primitives and serial_nr field and to keep all the business
inside memcg_create_kmem_cache() function.
>>
>>> +
>>> static int memcg_alloc_cache_id(void)
>>> {
>>> int id, size;
>>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>>> index 1b2a7d208596..24abd2e88efe 100644
>>> --- a/mm/slab_common.c
>>> +++ b/mm/slab_common.c
>>> @@ -421,6 +421,7 @@ static void do_kmem_cache_release(struct list_head *release,
>>> void memcg_create_kmem_cache(struct mem_cgroup *memcg,
>>> struct kmem_cache *root_cache)
>>> {
>>> + static atomic64_t serial_nr_cursor = ATOMIC64_INIT(0);
>>> static char memcg_name_buf[NAME_MAX + 1]; /* protected by slab_mutex */
>>> struct memcg_cache_array *arr;
>>> struct kmem_cache *s = NULL;
>>> @@ -453,8 +454,15 @@ void memcg_create_kmem_cache(struct mem_cgroup *memcg,
>>> strlcpy(memcg_name_buf, cgroup_name(mem_cgroup_css(memcg)->cgroup),
>>> NAME_MAX + 1);
>>> rcu_read_unlock();
>>> - cache_name = kasprintf(GFP_KERNEL, "%s(%d:%s)", root_cache->name,
>>> - idx, memcg_name_buf);
>>> +
>>> + /*
>>> + * Assign a monotonically increasing serial number.
>>> + * It guarantees kmem_cache name uniqueness even if a mem cgroup goes
>>> + * offline, releases kmemcg_id and new mem cgroup gets same kmemcg_id.
>>> + */
>>> + memcg_set_serial_nr(memcg, atomic64_inc_return(&serial_nr_cursor));
>>
>> Here we are already held slab_mutex, so it looks like atomic is overkill.
>> Simple u64 should be enough.
>
> You are definitely right, thank you.
>
>>
>>> + cache_name = kasprintf(GFP_KERNEL, "%s(%llu:%s)", root_cache->name,
>>> + memcg_get_serial_nr(memcg), memcg_name_buf);
>>> if (!cache_name)
>>> goto out_unlock;
>>
>> Kirill
>> .
>>
More information about the Devel
mailing list