[Devel] Re: [PATCH] x86: only clear node_states for 64bit

Yinghai Lu yinghai at kernel.org
Sat Jun 27 13:40:18 PDT 2009


Ingo Molnar wrote:
> * Yinghai Lu <yinghai at kernel.org> wrote:
> 
>> Andrew Morton wrote:
>>> On Mon, 22 Jun 2009 08:38:50 -0700
>>> Yinghai Lu <yinghai at kernel.org> wrote:
>>>
>>>> Nathan reported that
>>>> | commit 73d60b7f747176dbdff826c4127d22e1fd3f9f74
>>>> | Author: Yinghai Lu <yinghai at kernel.org>
>>>> | Date:   Tue Jun 16 15:33:00 2009 -0700
>>>> |
>>>> |    page-allocator: clear N_HIGH_MEMORY map before we set it again
>>>> |    
>>>> |    SRAT tables may contains nodes of very small size.  The arch code may
>>>> |    decide to not activate such a node.  However, currently the early boot
>>>> |    code sets N_HIGH_MEMORY for such nodes.  These nodes therefore seem to be
>>>> |    active although these nodes have no present pages.
>>>> |    
>>>> |    For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too
>>>>
>>>> the cpuset.mems cgroup attribute on an i386 kvm guest
>>>>
>>>> fix it by only clearing node_states[N_NORMAL_MEMORY] for 64bit only.
>>>> and need to do save/restore for that in find_zone_movable_pfn
>>>>
>>> There appear to be some words omitted from this changelog - it doesn't
>>> make sense.
>>>
>>> I think that perhaps a line got deleted before "the cpuset.mems cgroup
>>> ...".  That was the line which actualy describes the bug which we're
>>> fixing.  Or perhaps it was a single word?  "zeroes".
>>>
>>>
>>> I did this:
>>>
>>> Nathan reported that
>>> : 
>>> : | commit 73d60b7f747176dbdff826c4127d22e1fd3f9f74
>>> : | Author: Yinghai Lu <yinghai at kernel.org>
>>> : | Date:   Tue Jun 16 15:33:00 2009 -0700
>>> : |
>>> : |    page-allocator: clear N_HIGH_MEMORY map before we set it again
>>> : |
>>> : |    SRAT tables may contains nodes of very small size.  The arch code may
>>> : |    decide to not activate such a node.  However, currently the early boot
>>> : |    code sets N_HIGH_MEMORY for such nodes.  These nodes therefore seem to be
>>> : |    active although these nodes have no present pages.
>>> : |
>>> : |    For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too
>>> : 
>> "
>>> : unintentionally and incorrectly clears the cpuset.mems cgroup attribute on
>>> : an i386 kvm guest
>> "
>> ==> 
>>
>> 32bit assume NORMAL_MEMORY bit and HIGH_MEMORY bit are set for 
>> Node0 always.
> 
> Where in the code is this assumption?

in mm/page_alloc.c
/*
 * Array of node states.
 */
nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
        [N_POSSIBLE] = NODE_MASK_ALL,
        [N_ONLINE] = { { [0] = 1UL } },
#ifndef CONFIG_NUMA
        [N_NORMAL_MEMORY] = { { [0] = 1UL } },
#ifdef CONFIG_HIGHMEM
        [N_HIGH_MEMORY] = { { [0] = 1UL } },
#endif
        [N_CPU] = { { [0] = 1UL } },
#endif  /* NUMA */
};
EXPORT_SYMBOL(node_states);

for x86 64bit, we clear POSSIBLE and ONLINE in arch/x86/mm/numa_64.c::initmem_init
and this patch clear NORMAL in arch/x86/mm/init_64.c::paging_init

for x86 32bit: ONLINE get cleared in get_memcfg_from_srat()
and NORMAL and HIGH_MEMORY are not cleared
before try to set new in mm/page_alloc.c::free_area_init_nodes

> 
>> and some code only check if HIGH_MEMORY is there to know if 
>> NORMAL_MEMORY is there.
> 
> Which code is that exactly?
> 
with grep:
arch/x86/mm/init_64.c:	nodes_clear(node_states[N_NORMAL_MEMORY]);
drivers/base/node.c:	return print_nodes_state(N_NORMAL_MEMORY, buf);
include/linux/nodemask.h:	N_NORMAL_MEMORY,	/* The node has regular memory */
include/linux/nodemask.h:	N_HIGH_MEMORY = N_NORMAL_MEMORY,
mm/memcontrol.c:	if (!node_state(node, N_NORMAL_MEMORY))
mm/page_alloc.c:	[N_NORMAL_MEMORY] = { { [0] = 1UL } },
mm/page_alloc.c:			node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY) {
mm/slub.c:		for_each_node_state(node, N_NORMAL_MEMORY) {
mm/slub.c:		for_each_node_state(node, N_NORMAL_MEMORY) {
mm/slub.c:	for_each_node_state(node, N_NORMAL_MEMORY)

Documentation/cgroups/cpusets.txt:automatically tracks the value of node_states[N_HIGH_MEMORY]--i.e.,
Documentation/memory-hotplug.txt:status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
arch/ia64/kernel/uncached.c:		if (!node_state(nid, N_HIGH_MEMORY))
drivers/base/node.c:	return print_nodes_state(N_HIGH_MEMORY, buf);
include/linux/cpuset.h:#define cpuset_current_mems_allowed (node_states[N_HIGH_MEMORY])
include/linux/nodemask.h:	N_HIGH_MEMORY,		/* The node has regular or high memory */
include/linux/nodemask.h:	N_HIGH_MEMORY = N_NORMAL_MEMORY,
kernel/cpuset.c: * found any online mems, return node_states[N_HIGH_MEMORY].
kernel/cpuset.c: * of node_states[N_HIGH_MEMORY].
kernel/cpuset.c:					node_states[N_HIGH_MEMORY]))
kernel/cpuset.c:					node_states[N_HIGH_MEMORY]);
kernel/cpuset.c:		*pmask = node_states[N_HIGH_MEMORY];
kernel/cpuset.c:	BUG_ON(!nodes_intersects(*pmask, node_states[N_HIGH_MEMORY]));
kernel/cpuset.c:	 * top_cpuset.mems_allowed tracks node_stats[N_HIGH_MEMORY];
kernel/cpuset.c:				node_states[N_HIGH_MEMORY]))
kernel/cpuset.c:		    nodes_subset(cp->mems_allowed, node_states[N_HIGH_MEMORY]))
kernel/cpuset.c:						node_states[N_HIGH_MEMORY]);
kernel/cpuset.c: * Keep top_cpuset.mems_allowed tracking node_states[N_HIGH_MEMORY].
kernel/cpuset.c: * Call this routine anytime after node_states[N_HIGH_MEMORY] changes.
kernel/cpuset.c:		top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
kernel/cpuset.c:	top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
kernel/cpuset.c: * subset of node_states[N_HIGH_MEMORY], even if this means going outside the
mm/memcontrol.c:		for_each_node_state(node, N_HIGH_MEMORY) {
mm/memory_hotplug.c:		node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
mm/mempolicy.c:	if (!nodes_subset(new, node_states[N_HIGH_MEMORY])) {
mm/mempolicy.c:	for_each_node_state(nid, N_HIGH_MEMORY) {
mm/mempolicy.c:		if (!nodes_subset(nodes, node_states[N_HIGH_MEMORY]))
mm/mempolicy.c:			nodes = node_states[N_HIGH_MEMORY];
mm/mempolicy.c:			&node_states[N_HIGH_MEMORY], MPOL_MF_STATS, md);
mm/mempolicy.c:	for_each_node_state(n, N_HIGH_MEMORY)
mm/migrate.c:			if (!node_state(node, N_HIGH_MEMORY))
mm/oom_kill.c:	nodemask_t nodes = node_states[N_HIGH_MEMORY];
mm/page-writeback.c:	for_each_node_state(node, N_HIGH_MEMORY) {
mm/page_alloc.c:	[N_HIGH_MEMORY] = { { [0] = 1UL } },
mm/page_alloc.c: * tasks mems_allowed, or node_states[N_HIGH_MEMORY].)
mm/page_alloc.c:					&node_states[N_HIGH_MEMORY];
mm/page_alloc.c:	for_each_node_state(n, N_HIGH_MEMORY) {
mm/page_alloc.c:				(nodes_weight(node_states[N_HIGH_MEMORY]) + 1);
mm/page_alloc.c: * Populate N_HIGH_MEMORY for calculating usable_nodes.
mm/page_alloc.c:			node_set_state(early_node_map[i].nid, N_HIGH_MEMORY);
mm/page_alloc.c:	nodemask_t saved_node_state = node_states[N_HIGH_MEMORY];
mm/page_alloc.c:	int usable_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
mm/page_alloc.c:	for_each_node_state(nid, N_HIGH_MEMORY) {
mm/page_alloc.c:	node_states[N_HIGH_MEMORY] = saved_node_state;
mm/page_alloc.c:			node_set_state(nid, N_HIGH_MEMORY);
mm/vmalloc.c:		for_each_node_state(nr, N_HIGH_MEMORY)
mm/vmscan.c:		for_each_node_state(nid, N_HIGH_MEMORY) {
mm/vmscan.c:	for_each_node_state(nid, N_HIGH_MEMORY)
mm/vmstat.c:	if (!node_state(pgdat->node_id, N_HIGH_MEMORY))

for 64bit N_HIGH_MEMORY == NORMAL_MEMORY

for 32bit, there are more reference to N_HIGH_MEMORY...

YH

_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list