[Devel] What does glommer think about kmem cgroup ?

Glauber Costa glommer at parallels.com
Thu Oct 13 08:50:29 PDT 2011


Hi guys,

So, linuxcon is approaching. To help making our discussions more 
productive, I sketched a basic prototype of a kmem cgroup that can
control the size of the dentry cache. I am sending the code here so you 
guys can have an idea, but keep in mind this is a *sketch*. This is my 
view of how our controller *could be*, not necessarily what it *should 
be*. All your input is more than welcome.

Let me first explain a bit of my approach: (there are some comments 
inline as well)

* So far it only works with the slab (you will see that something 
similar can be done for at least the slub) Since most of us is
concerned mostly with memory abuse (I think), I neglected for simplicity 
the initial memory allocated for the arrays. Only when
cache_grow is called to allocated more pages, is that we bill then.

* I avoid resorting to the shrinkers, trying to free the slab pages
themselves whenever possible.

* We don't limit the size of all caches. They have to register 
themselves explicitly (and in this PoC, I am using the dentry cache as
an example)

* The object is billed to whoever touched it first. Other policies are
of course possible.

What I am *not* concerned about in this PoC: (left for future work, if
needed)
- unified user/memory kernel memory reclaim
- changes to the shrinkers.
- changes to the limit once it is already in place
- per-cgroup display in /proc/slabinfo
- task movement
- a whole lot of other stuff.

* Hey glommer, do you have numbers?
Yes, I have 8 numbers. And since 8 is also a number, then I have 9 numbers.

So what I did was to type "find /" in a freshly booted system (my 
laptop). I just ran each iteration once, so nothing scientific. I halved 
the limits until the allocations started to fail, which was
more or less around 256K hard limit. Find is also not a workload that
pins the dentries in memory for very long. Other kinds of workloads
will display different results here...

Base: (non-patched kernel)
real	0m16.091s
user	0m0.567s
sys	0m6.649s

Patched kernel, root cgroup (unlimited. max used mem: 22Mb)
real	0m15.853s
user	0m0.511s
sys	0m6.417s

16Mb/4Mb (HardLimit/SoftLimit)
real	0m16.596s
user	0m0.560s
sys	0m6.947s

8Mb/4Mb
real	0m16.975s
user	0m0.568s
sys	0m7.047s


4Mb/2Mb
real	0m16.713s
user	0m0.554s
sys	0m7.022s

2Mb/1Mb
real	0m17.001s
user	0m0.544s
sys	0m7.118s

1Mb/512K
real	0m16.671s
user	0m0.530s
sys	0m7.067s

512k/256k
real	0m17.395s
user	0m0.567s
sys	0m7.179s

So, what those initial numbers do tell us, is that the performance 
penalty for the root cgroup is not expected to be that bad. When the 
limits start to be hit, a penalty is incurred, which is under the 
expectations.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: basic-code.patch
URL: <http://lists.openvz.org/pipermail/devel/attachments/20111013/22e2dedf/attachment-0001.ksh>


More information about the Devel mailing list