[Devel] memcg: mem_cgroup_uncharge_page() kernel panic/lockup
Anatoly Stepanov
astepanov at cloudlinux.com
Sun Jun 12 23:24:33 PDT 2016
Hello everyone!
We encounter an issue with mem_cgroup_uncharge_page() function,
it appears quite often on our clients servers.
Basically the issue sometimes leads to hard-lockup, sometimes to GP fault.
Based on bug reports from clients, the problem shows up when a user
process calls "execve" or "exit" syscalls.
As we know in those cases kernel invokes "uncharging" for every page
when its unmapped from all the mm's.
Kernel dump analysis shows that at the moment of
mem_cgroup_uncharge_page() "memcg" pointer
(taken from page_cgroup) seems to be pointing to some random memory area.
On the other hand, if we look at current->mm->css, then memcg instance
exists and is "online".
This led me to a thought that "page_cgroup->memcg" may be changed by
some part of memcg code in parallel.
As far as i understand, the only option here is "reclaim code path"
(may be i'm wrong)
So, i suppose there might be a race between "memcg uncharge code" and
"memcg reclaim code".
Please, give me your thoughts about it
thanks
P.S.:
Additional info:
Kernel: rh7-3.10.0-327.10.1.vz7.12.14
*************************************************1st
BT************************************************
PID: 972445 TASK: ffff88065d53d8d0 CPU: 0 COMMAND: "httpd"
#0 [ffff880224f37818] machine_kexec at ffffffff8105249b
#1 [ffff880224f37878] crash_kexec at ffffffff81103532
#2 [ffff880224f37948] oops_end at ffffffff81641628
#3 [ffff880224f37970] die at ffffffff810184cb
#4 [ffff880224f379a0] do_general_protection at ffffffff81640f24
#5 [ffff880224f379d0] general_protection at ffffffff81640768
[exception RIP: mem_cgroup_charge_statistics+19]
RIP: ffffffff811e7733 RSP: ffff880224f37a80 RFLAGS: 00010202
RAX: ffffffffffffffff RBX: ffff8807b26f0110 RCX: 00000000ffffffff
RDX: 79726f6765746163 RSI: ffffea000c9c0440 RDI: ffff8806a55662f8
RBP: ffff880224f37a80 R8: 0000000000000000 R9: 0000000003808000
R10: 00000000000000b8 R11: ffffea001eaa8980 R12: ffffea000c9c0440
R13: 0000000000000001 R14: 0000000000000000 R15: ffff8806a5566000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#6 [ffff880224f37a88] __mem_cgroup_uncharge_common at ffffffff811e9ddf
#7 [ffff880224f37ac8] mem_cgroup_uncharge_page at ffffffff811ee99a
#8 [ffff880224f37ad8] page_remove_rmap at ffffffff811b9ec9
#9 [ffff880224f37b10] unmap_page_range at ffffffff811ab580
#10 [ffff880224f37bf8] unmap_single_vma at ffffffff811aba11
#11 [ffff880224f37c30] unmap_vmas at ffffffff811ace79
#12 [ffff880224f37c68] exit_mmap at ffffffff811b663c
#13 [ffff880224f37d18] mmput at ffffffff8107853b
#14 [ffff880224f37d38] flush_old_exec at ffffffff81202547
#15 [ffff880224f37d88] load_elf_binary at ffffffff8125883c
#16 [ffff880224f37e58] search_binary_handler at ffffffff81201c25
#17 [ffff880224f37ea0] do_execve_common at ffffffff812032b7
#18 [ffff880224f37f30] sys_execve at ffffffff81203619
#19 [ffff880224f37f50] stub_execve at ffffffff81649369
RIP: 00007f54284b3287 RSP: 00007ffda57a0698 RFLAGS: 00000297
RAX: 000000000000003b RBX: 00000000037c5fe8 RCX: ffffffffffffffff
RDX: 00000000037cf3f8 RSI: 00000000037ce5f8 RDI: 00007f5425fcabf1
RBP: 00007ffda57a0750 R8: 0000000000000001 R9: 0000000000000000
***************************************2nd
BT**************************************************:
PID: 168440 TASK: ffff88001e31cc20 CPU: 18 COMMAND: "httpd"
#0 [ffff88007255f838] machine_kexec at ffffffff8105249b
#1 [ffff88007255f898] crash_kexec at ffffffff81103532
#2 [ffff88007255f968] oops_end at ffffffff81641628
#3 [ffff88007255f990] no_context at ffffffff8163222b
#4 [ffff88007255f9e0] __bad_area_nosemaphore at ffffffff816322c1
#5 [ffff88007255fa30] bad_area_nosemaphore at ffffffff8163244a
#6 [ffff88007255fa40] __do_page_fault at ffffffff8164443e
#7 [ffff88007255faa0] trace_do_page_fault at ffffffff81644673
#8 [ffff88007255fad8] do_async_page_fault at ffffffff81643d59
#9 [ffff88007255faf0] async_page_fault at ffffffff816407f8
[exception RIP: memcg_check_events+435]
RIP: ffffffff811e9b53 RSP: ffff88007255fba0 RFLAGS: 00010246
RAX: 00000000f81ef81e RBX: ffff8802106d5000 RCX: 0000000000000000
RDX: 000000000000f81e RSI: 0000000000020000 RDI: ffff8807aa2642e8
RBP: ffff88007255fbf0 R8: 0000000000000202 R9: 0000000000000000
R10: 0000000000000010 R11: ffff88007255ffd8 R12: ffff8807aa2642e0
R13: 0000000000000410 R14: ffff8802073de700 R15: ffff8802106d5000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff88007255fbf8] __mem_cgroup_uncharge_common at ffffffff811e9df2
#11 [ffff88007255fc38] mem_cgroup_uncharge_page at ffffffff811ee99a
#12 [ffff88007255fc48] page_remove_rmap at ffffffff811b9ec9
#13 [ffff88007255fc80] unmap_page_range at ffffffff811ab580
#14 [ffff88007255fd68] unmap_single_vma at ffffffff811aba11
#15 [ffff88007255fda0] unmap_vmas at ffffffff811ace79
#16 [ffff88007255fdd8] exit_mmap at ffffffff811b663c
#17 [ffff88007255fe88] mmput at ffffffff8107853b
#18 [ffff88007255fea8] do_exit at ffffffff81081d8c
#19 [ffff88007255ff40] do_group_exit at ffffffff8108266f
#20 [ffff88007255ff70] sys_exit_group at ffffffff810826e4
#21 [ffff88007255ff80] system_call_fastpath at ffffffff81648dc9
RIP: 00007fc210ea4259 RSP: 00007ffe20580fa8 RFLAGS: 00010206
RAX: 00000000000000e7 RBX: ffffffff81648dc9 RCX: 0000000000000000
*******************************************3rd
BT**********************************************:
PID: 1003121 TASK: ffff880036b58000 CPU: 1 COMMAND: "httpd"
#0 [ffff880237a459c8] machine_kexec at ffffffff8105249b
#1 [ffff880237a45a28] crash_kexec at ffffffff81103532
#2 [ffff880237a45af8] panic at ffffffff816329b0
#3 [ffff880237a45b78] watchdog_overflow_callback at ffffffff8112cee2
#4 [ffff880237a45b88] __perf_event_overflow at ffffffff81171c11
#5 [ffff880237a45c00] perf_event_overflow at ffffffff811726e4
#6 [ffff880237a45c10] intel_pmu_handle_irq at ffffffff81032e98
#7 [ffff880237a45e60] perf_event_nmi_handler at ffffffff8164206b
#8 [ffff880237a45e80] nmi_handle at ffffffff816417b9
#9 [ffff880237a45ec8] do_nmi at ffffffff816418d0
#10 [ffff880237a45ef0] end_repeat_nmi at ffffffff81640b93
[exception RIP: _raw_spin_lock+58]
RIP: ffffffff8163ff7a RSP: ffff88003e16fa28 RFLAGS: 00000006
RAX: 00000000000048f6 RBX: ffff8803edbab870 RCX: 0000000000006120
RDX: 0000000000006362 RSI: 0000000000006362 RDI: ffff8803edbab898
RBP: ffff88003e16fa28 R8: 0000000000000000 R9: 0000000002d98000
R10: 0000000000002295 R11: ffffea0010d1f080 R12: 0000000000000000
R13: ffff8803edbab870 R14: 0000000000000000 R15: ffff8803edbab898
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#11 [ffff88003e16fa28] _raw_spin_lock at ffffffff8163ff7a
#12 [ffff88003e16fa30] res_counter_uncharge_until at ffffffff81114df9
#13 [ffff88003e16fa78] res_counter_uncharge at ffffffff81114e73
#14 [ffff88003e16fa88] __mem_cgroup_uncharge_common at ffffffff811e9e7c
#15 [ffff88003e16fac8] mem_cgroup_uncharge_page at ffffffff811ee99a
#16 [ffff88003e16fad8] page_remove_rmap at ffffffff811b9ec9
#17 [ffff88003e16fb10] unmap_page_range at ffffffff811ab580
#18 [ffff88003e16fbf8] unmap_single_vma at ffffffff811aba11
#19 [ffff88003e16fc30] unmap_vmas at ffffffff811ace79
#20 [ffff88003e16fc68] exit_mmap at ffffffff811b663c
#21 [ffff88003e16fd18] mmput at ffffffff8107853b
#22 [ffff88003e16fd38] flush_old_exec at ffffffff81202547
#23 [ffff88003e16fd88] load_elf_binary at ffffffff8125883c
#24 [ffff88003e16fe58] search_binary_handler at ffffffff81201c25
#25 [ffff88003e16fea0] do_execve_common at ffffffff812032b7
#26 [ffff88003e16ff30] sys_execve at ffffffff81203619
#27 [ffff88003e16ff50] stub_execve at ffffffff81649369
RIP: 00007f54e8341287 RSP: 00007fffcd0d22e8 RFLAGS: 00000297
RAX: 000000000000003b RBX: 0000000002d8b2a0 RCX: ffffffffffffffff
RDX: 0000000002d8a810 RSI: 0000000002db4128 RDI: 00007f54e605cbf1
RBP: 00007fffcd0d23a0 R8: 0000000000000001 R9: 0000000000000000
R10: 00007fffcd0d2050 R11: 0000000000000297 R12: 0000000002d8a810
R13: 0000000002db3a50 R14: 0000000002da8440 R15: 0000000000000000
ORIG_RAX: 000000000000003b CS: 0033 SS: 002b
More information about the Devel
mailing list