[Devel] [PATCH -mm 1/4] sl[au]b: do not charge large allocations to memcg

Vladimir Davydov vdavydov at parallels.com
Thu Mar 27 00:37:11 PDT 2014


Hi Greg,

On 03/27/2014 08:31 AM, Greg Thelen wrote:
> On Wed, Mar 26 2014, Vladimir Davydov <vdavydov at parallels.com> wrote:
>
>> We don't track any random page allocation, so we shouldn't track kmalloc
>> that falls back to the page allocator.
> This seems like a change which will leads to confusing (and arguably
> improper) kernel behavior.  I prefer the behavior prior to this patch.
>
> Before this change both of the following allocations are charged to
> memcg (assuming kmem accounting is enabled):
>  a = kmalloc(KMALLOC_MAX_CACHE_SIZE, GFP_KERNEL)
>  b = kmalloc(KMALLOC_MAX_CACHE_SIZE + 1, GFP_KERNEL)
>
> After this change only 'a' is charged; 'b' goes directly to page
> allocator which no longer does accounting.

Why do we need to charge 'b' in the first place? Can the userspace
trigger such allocations massively? If there can only be one or two such
allocations from a cgroup, is there any point in charging them?

In fact, do we actually need to charge every random kmem allocation? I
guess not. For instance, filesystems often allocate data shared among
all the FS users. It's wrong to charge such allocations to a particular
memcg, IMO. That said the next step is going to be adding a per kmem
cache flag specifying if allocations from this cache should be charged
so that accounting will work only for those caches that are marked so
explicitly.

There is one more argument for removing kmalloc_large accounting - we
don't have an easy way to track such allocations, which prevents us from
reparenting kmemcg charges on css offline. Of course, we could link
kmalloc_large pages in some sort of per-memcg list which would allow us
to find them on css offline, but I don't think such a complication is
justified.

Thanks.



More information about the Devel mailing list