[Devel] Re: [RFC] [-mm PATCH] Memory controller fix swap charging context in unuse_pte()

Hugh Dickins hugh at veritas.com
Thu Oct 25 12:33:36 PDT 2007


On Wed, 24 Oct 2007, Balbir Singh wrote:
> Hugh Dickins wrote:
> > 
> > Thanks, Balbir.  Sorry for the delay.  I've not forgotten our
> > agreement that I should be splitting it into before-and-after
> > mem cgroup patches.  But it's low priority for me until we're
> > genuinely assigning to a cgroup there.  Hope to get back to
> > looking into that tomorrow, but no promises.
> 
> No Problem. We have some time with this one.

Phew - I still haven't got there.

> > I think you still see no problem, where I claim that simply
> > omitting the mem charge mods from mm/swap_state.c leads to OOMs?
> > Maybe our difference is because my memhog in the cgroup is using
> > more memory than RAM, not just more memory than allowed to the
> > cgroup.  I suspect that arrives at a state (when the swapcache
> > pages are not charged) where it cannot locate the pages it needs
> > to reclaim to stay within its limit.
> 
> Yes, in my case there I use memory less than RAM and more than that
> is allowed by the cgroup. It's quite possible that in your case the
> swapcache has grown significantly without any limit/control on it.
> The memhog program is using memory at a rate much higher than the
> rate of reclaim. Could you share your memhog program, please?

Gosh, it's nothing special.  Appended below, but please don't shame
me by taking it too seriously.  Defaults to working on a 600M mmap
because I'm in the habit of booting mem=512M.  You probably have
something better yourself that you'd rather use.

> In the use case you've mentioned/tested, having these mods to
> control swapcache is actually useful, right?

No idea what you mean by "these mods to control swapcache"?

With your mem_cgroup mods in mm/swap_state.c, swapoff assigns
the pages read in from swap to whoever's running swapoff and your
unuse_pte mem_cgroup_charge never does anything useful: swap pages
should get assigned to the appropriate cgroups at that point.

Without your mem_cgroup mods in mm/swap_state.c, unuse_pte makes
the right assignments (I believe).  But I find that swapout (using
600M in a 512M machine) from a 200M cgroup quickly OOMs, whereas
it behaves correctly with your mm/swap_state.c.

Thought little yet about what happens to shmem swapped pages,
and swap readahead pages; but still suspect that they and the
above issue will need a "limbo" cgroup, for pages which are
expected to belong to a not-yet-identified mem cgroup.

> 
> Could you share your major objections at this point with the memory
> controller at this point. I hope to be able to look into/resolve them
> as my first priority in my list of items to work on.

The things I've noticed so far, as mentioned before and above.

But it does worry me that I only came here through finding swapoff
broken by that unuse_mm return value, and then found one issue
after another.  It feels like the mem cgroup people haven't really
thought through or tested swap at all, and that if I looked further
I'd uncover more.

That's simply FUD, and I apologize if I'm being unfair: but that
is how it feels, and I expect we all know that phase in a project
when solving one problem uncovers three - suggests it's not ready.

Hugh

/* swapout.c */
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <sys/mman.h>

int main(int argc, char *argv[])
{
	unsigned long *base = (unsigned long *)0x08400000;
	unsigned long size;
	unsigned long limit;
	unsigned long i;
	char *ptr = NULL;

	size = argv[1]? strtoul(argv[1], &ptr, 0): 600;
	if (size >= 3*1024)
		size = 0;
	size *= 1024*1024;
	limit = size / sizeof(unsigned long);
	if (size == 0 || base + limit + 1024 > &size) {
		errno = EINVAL;
		perror("swapout");
		exit(1);
	}
	base = mmap(base, size, PROT_READ|PROT_WRITE,
		    MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
	if (base == (unsigned long *)(-1)) {
		perror("mmap");
		exit(1);
	}
	for (i = 0; i < limit; i++)
		base[i] = i;
	if (ptr && *ptr == '.') {
		printf("Type <Return> to continue ");
		fflush(stdout);
		getchar();
	}
	for (i = 0; i < limit; i++)
		base[i] = limit - i;
	return 0;
}
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list