[Devel] Re: [RFC] [PATCH] Cgroup based OOM killer controller

Thu Jan 22 02:14:24 PST 2009

On Thu, Jan 22, 2009 at 02:00:55AM -0800, David Rientjes (rientjes at google.com) wrote:
> 
> In an exclusive cpuset, a task's memory is restricted to a set of mems 
> that the administrator has designated.  If it is oom, the kernel must free 
> memory on those nodes or the next allocation will again trigger an oom 
> (leading to a needlessly killed task that was in a disjoint cpuset).
> 
> Really.

The whole point of oom-killer is to kill the most appropriate task to
free the memory. And while task is selected system-wide and some
tunables are added to tweak the behaviour local to some subsystems, this
cpuset feature is hardcoded into the selection algorithm.
And when some tunable starts doing own calculation, behaviour of this
hardcoded feature changes.

This is intended to change it. Because admin has to have ability to tune
system the way he needs and not some special hueristics, which may not
work all the time.

That is the point against cpuset argument. Make it tunable the same way
we have oom_adj and/or this cgroup order feature.

> > In this case administrator will not do this. It is up to him to decide
> > and not some inner kernel policy.
> > 
> 
> Then the scope of this new cgroup is restricted to not being used with 
> cpusets that could oom.

These are perpendicular tasks - cpusets limit one area of the oom
handling, cgroup order - another. Some people needs cpusets, others want
cgroups. cpusets are not something exceptional so that only they have to
be taken into account when doing system-wide operation like OOM
condition handling.

-- 
	Evgeniy Polyakov
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers