[Devel] Re: [PATCH v2 02/13] memcg: Kernel memory accounting infrastructure.
Glauber Costa
glommer at parallels.com
Tue Mar 13 03:37:30 PDT 2012
> After looking codes, I think we need to think
> whether independent_kmem_limit is good or not....
>
> How about adding MEMCG_KMEM_ACCOUNT flag instead of this and use only
> memcg->res/memcg->memsw rather than adding a new counter, memcg->kmem ?
>
> if MEMCG_KMEM_ACCOUNT is set -> slab is accoutned to mem->res/memsw.
> if MEMCG_KMEM_ACCOUNT is not set -> slab is never accounted.
>
> (I think On/Off switch is required..)
>
> Thanks,
> -Kame
>
This has been discussed before, I can probably find it in the archives
if you want to go back and see it.
But in a nutshell:
1) Supposing independent knob disappear (I will explain in item 2 why I
don't want it to), I don't thing a flag makes sense either. *If* we are
planning to enable/disable this, it might make more sense to put some
work on it, and allow particular slabs to be enabled/disabled by writing
to memory.kmem.slabinfo (-* would disable all, +* enable all, +kmalloc*
enable all kmalloc, etc).
Alternatively, what we could do instead, is something similar to what
ended up being done for tcp, by request of the network people: if you
never touch the limit file, don't bother with it at all, and simply does
not account. With Suleiman's lazy allocation infrastructure, that should
actually be trivial. And then again, a flag is not necessary, because
writing to the limit file does the job, and also convey the meaning well
enough.
2) For the kernel itself, we are mostly concerned that a malicious
container may pin into memory big amounts of kernel memory which is,
ultimately, unreclaimable. In particular, with overcommit allowed
scenarios, you can fill the whole physical memory (or at least a
significant part) with those objects, well beyond your softlimit
allowance, making the creation of further containers impossible.
With user memory, you can reclaim the cgroup back to its place. With
kernel memory, you can't.
In the particular example of 32-bit boxes, you can easily fill up a
large part of the available 1gb kernel memory with pinned memory and
render the whole system unresponsive.
Never allowing the kernel memory to go beyond the soft limit was one of
the proposed alternatives. However, it may force you to establish a soft
limit where one was not previously needed. Or, establish a low soft
limit when you really need a bigger one.
All that said, while reading your message, thinking a bit, the following
crossed my mind:
- We can account the slabs to memcg->res normally, and just store the
information that this is kernel memory into a percpu counter, as
I proposed recently.
- The knob goes away, and becomes implicit: if you ever write anything
to memory.kmem.limit_in_bytes, we transfer that memory to a separate
kmem res_counter, and proceed from there. We can keep accounting to
memcg->res anyway, just that kernel memory will now have a separate
limit.
- With this scheme, it may not be necessary to ever have a file
memory.kmem.soft_limit_in_bytes. Reclaim is always part of the normal
memcg reclaim.
The outlined above would work for us, and make the whole scheme simpler,
I believe.
What do you think ?
More information about the Devel
mailing list