[Devel] Re: [PATCH v3 04/13] kmem accounting basic infrastructure

Johannes Weiner hannes at cmpxchg.org
Wed Sep 26 15:11:36 PDT 2012


On Thu, Sep 27, 2012 at 12:02:14AM +0400, Glauber Costa wrote:
> On 09/26/2012 11:56 PM, Tejun Heo wrote:
> > Hello,
> > 
> > On Wed, Sep 26, 2012 at 11:46:37PM +0400, Glauber Costa wrote:
> >> Besides not being part of cgroup core, and respecting very much both
> >> cgroups' and basic sanity properties, kmem is an actual feature that
> >> some people want, and some people don't. There is no reason to believe
> >> that applications that want will live in the same environment with ones
> >> that don't want.
> > 
> > I don't know.  It definitely is less crazy than .use_hierarchy but I
> > wouldn't say it's an inherently different thing.  I mean, what does it
> > even mean to have u+k limit on one subtree and not on another branch?
> > And we worry about things like what if parent doesn't enable it but
> > its chlidren do.
> > 
> 
> It is inherently different. To begin with, it actually contemplates two
> use cases. It is not a work around.
> 
> The meaning is also very well defined. The meaning of having this
> enabled in one subtree and not in other is: Subtree A wants to track
> kernel memory. Subtree B does not. It's that, and never more than that.
> There is no maybes and no buts, no magic knobs that makes it behave in a
> crazy way.
> 
> If a children enables it but the parent does not, this does what every
> tree does: enable it from that point downwards.
> 
> > This is a feature which adds complexity.  If the feature is necessary
> > and justified, sure.  If not, let's please not and let's err on the
> > side of conservativeness.  We can always add it later but the other
> > direction is much harder.
> 
> I disagree. Having kmem tracking adds complexity. Having to cope with
> the use case where we turn it on dynamically to cope with the "user page
> only" use case adds complexity. But I see no significant complexity
> being added by having it per subtree. Really.

Maybe not in code, but you are adding an extra variable into the
system.  "One switch per subtree" is more complex than "one switch."
Yes, the toggle is hidden behind setting the limit, but it's still a
toggle.  The use_hierarchy complexity comes not from the file that
enables it, but from the resulting semantics.

kmem accounting is expensive and we definitely want to allow enabling
it separately from traditional user memory accounting.  But I think
there is no good reason to not demand an all-or-nothing answer from
the admin; either he wants kmem tracking on a machine or not.  At
least you haven't presented a convincing case, IMO.

I don't think there is strong/any demand for per-node toggles, but
once we add this behavior, people will rely on it and expect kmem
tracking to stay local and we are stuck with it.  Adding it for the
reason that people will use it is a self-fulfilling prophecy.

> You have the use_hierarchy fiasco in mind, and I do understand that you
> are raising the flag and all that.
> 
> But think in terms of functionality: This thing here is a lot more
> similar to swap than use_hierarchy. Would you argue that memsw should be
> per-root ?

We actually do have a per-root flag that controls accounting for swap.

> The reason why it shouldn't: Some people want to limit memory
> consumption all the way to the swap, some people don't. Same with kmem.

That lies in the nature of the interface: we chose k & u+k rather than
u & u+k, so our memory.limit_in_bytes will necessarily include kmem,
while swap is not included there.  But I really doubt that there is a
strong case for turning on swap accounting intentionally and then
limiting memory+swap only on certain subtrees.  Where would be the
sense in that?




More information about the Devel mailing list