[Devel] memcg: mem_cgroup_uncharge_page() kernel panic/lockup

Anatoly Stepanov astepanov at cloudlinux.com
Wed Jun 15 15:13:51 PDT 2016


Hi, Vladimir!

Thanks for a quick response.

I created JIRA issue and uploaded the dumps.

All the information is included into JIRA issue:
https://bugs.openvz.org/browse/OVZ-6756


On Wed, Jun 15, 2016 at 11:47 AM, Vladimir Davydov
<vdavydov at virtuozzo.com> wrote:
> Hi,
>
> Thanks for the report.
>
> Could you please
>
>  - file a bug to bugzilla.openvz.org
>
>  - upload the vmcore at
>    rsync://fe.sw.ru/f837d67c8e2ade8cee3367cb0f880268/
>
> On Mon, Jun 13, 2016 at 09:24:33AM +0300, Anatoly Stepanov wrote:
>> Hello everyone!
>>
>> We encounter an issue with mem_cgroup_uncharge_page() function,
>> it appears quite often on our clients servers.
>>
>> Basically the issue sometimes leads to hard-lockup, sometimes to GP fault.
>>
>> Based on bug reports from clients, the problem shows up when a user
>> process calls "execve" or "exit" syscalls.
>> As we know in those cases kernel invokes "uncharging" for every page
>> when its unmapped from all the mm's.
>>
>> Kernel dump analysis shows that at the moment of
>> mem_cgroup_uncharge_page() "memcg" pointer
>> (taken from page_cgroup) seems to be pointing to some random memory area.
>>
>> On the other hand, if we look at current->mm->css, then memcg instance
>> exists and is "online".
>>
>> This led me to a thought that "page_cgroup->memcg" may be changed by
>> some part of memcg code in parallel.
>> As far as i understand, the only option here is "reclaim code path"
>> (may be i'm wrong)
>>
>> So, i suppose there might be a race between "memcg uncharge code" and
>> "memcg reclaim code".
>>
>> Please, give me your thoughts about it
>> thanks
>>
>> P.S.:
>>
>> Additional info:
>>
>> Kernel: rh7-3.10.0-327.10.1.vz7.12.14
>>
>> *************************************************1st
>> BT************************************************
>>
>> PID: 972445  TASK: ffff88065d53d8d0  CPU: 0   COMMAND: "httpd"
>>  #0 [ffff880224f37818] machine_kexec at ffffffff8105249b
>>  #1 [ffff880224f37878] crash_kexec at ffffffff81103532
>>  #2 [ffff880224f37948] oops_end at ffffffff81641628
>>  #3 [ffff880224f37970] die at ffffffff810184cb
>>  #4 [ffff880224f379a0] do_general_protection at ffffffff81640f24
>>  #5 [ffff880224f379d0] general_protection at ffffffff81640768
>>     [exception RIP: mem_cgroup_charge_statistics+19]
>>     RIP: ffffffff811e7733  RSP: ffff880224f37a80  RFLAGS: 00010202
>>     RAX: ffffffffffffffff  RBX: ffff8807b26f0110  RCX: 00000000ffffffff
>>     RDX: 79726f6765746163  RSI: ffffea000c9c0440  RDI: ffff8806a55662f8
>>     RBP: ffff880224f37a80   R8: 0000000000000000   R9: 0000000003808000
>>     R10: 00000000000000b8  R11: ffffea001eaa8980  R12: ffffea000c9c0440
>>     R13: 0000000000000001  R14: 0000000000000000  R15: ffff8806a5566000
>>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>  #6 [ffff880224f37a88] __mem_cgroup_uncharge_common at ffffffff811e9ddf
>>  #7 [ffff880224f37ac8] mem_cgroup_uncharge_page at ffffffff811ee99a
>>  #8 [ffff880224f37ad8] page_remove_rmap at ffffffff811b9ec9
>>  #9 [ffff880224f37b10] unmap_page_range at ffffffff811ab580
>> #10 [ffff880224f37bf8] unmap_single_vma at ffffffff811aba11
>> #11 [ffff880224f37c30] unmap_vmas at ffffffff811ace79
>> #12 [ffff880224f37c68] exit_mmap at ffffffff811b663c
>> #13 [ffff880224f37d18] mmput at ffffffff8107853b
>> #14 [ffff880224f37d38] flush_old_exec at ffffffff81202547
>> #15 [ffff880224f37d88] load_elf_binary at ffffffff8125883c
>> #16 [ffff880224f37e58] search_binary_handler at ffffffff81201c25
>> #17 [ffff880224f37ea0] do_execve_common at ffffffff812032b7
>> #18 [ffff880224f37f30] sys_execve at ffffffff81203619
>> #19 [ffff880224f37f50] stub_execve at ffffffff81649369
>>     RIP: 00007f54284b3287  RSP: 00007ffda57a0698  RFLAGS: 00000297
>>     RAX: 000000000000003b  RBX: 00000000037c5fe8  RCX: ffffffffffffffff
>>     RDX: 00000000037cf3f8  RSI: 00000000037ce5f8  RDI: 00007f5425fcabf1
>>     RBP: 00007ffda57a0750   R8: 0000000000000001   R9: 0000000000000000
>>
>>
>> ***************************************2nd
>> BT**************************************************:
>>
>> PID: 168440  TASK: ffff88001e31cc20  CPU: 18  COMMAND: "httpd"
>>  #0 [ffff88007255f838] machine_kexec at ffffffff8105249b
>>  #1 [ffff88007255f898] crash_kexec at ffffffff81103532
>>  #2 [ffff88007255f968] oops_end at ffffffff81641628
>>  #3 [ffff88007255f990] no_context at ffffffff8163222b
>>  #4 [ffff88007255f9e0] __bad_area_nosemaphore at ffffffff816322c1
>>  #5 [ffff88007255fa30] bad_area_nosemaphore at ffffffff8163244a
>>  #6 [ffff88007255fa40] __do_page_fault at ffffffff8164443e
>>  #7 [ffff88007255faa0] trace_do_page_fault at ffffffff81644673
>>  #8 [ffff88007255fad8] do_async_page_fault at ffffffff81643d59
>>  #9 [ffff88007255faf0] async_page_fault at ffffffff816407f8
>>     [exception RIP: memcg_check_events+435]
>>     RIP: ffffffff811e9b53  RSP: ffff88007255fba0  RFLAGS: 00010246
>>     RAX: 00000000f81ef81e  RBX: ffff8802106d5000  RCX: 0000000000000000
>>     RDX: 000000000000f81e  RSI: 0000000000020000  RDI: ffff8807aa2642e8
>>     RBP: ffff88007255fbf0   R8: 0000000000000202   R9: 0000000000000000
>>     R10: 0000000000000010  R11: ffff88007255ffd8  R12: ffff8807aa2642e0
>>     R13: 0000000000000410  R14: ffff8802073de700  R15: ffff8802106d5000
>>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>> #10 [ffff88007255fbf8] __mem_cgroup_uncharge_common at ffffffff811e9df2
>> #11 [ffff88007255fc38] mem_cgroup_uncharge_page at ffffffff811ee99a
>> #12 [ffff88007255fc48] page_remove_rmap at ffffffff811b9ec9
>> #13 [ffff88007255fc80] unmap_page_range at ffffffff811ab580
>> #14 [ffff88007255fd68] unmap_single_vma at ffffffff811aba11
>> #15 [ffff88007255fda0] unmap_vmas at ffffffff811ace79
>> #16 [ffff88007255fdd8] exit_mmap at ffffffff811b663c
>> #17 [ffff88007255fe88] mmput at ffffffff8107853b
>> #18 [ffff88007255fea8] do_exit at ffffffff81081d8c
>> #19 [ffff88007255ff40] do_group_exit at ffffffff8108266f
>> #20 [ffff88007255ff70] sys_exit_group at ffffffff810826e4
>> #21 [ffff88007255ff80] system_call_fastpath at ffffffff81648dc9
>>     RIP: 00007fc210ea4259  RSP: 00007ffe20580fa8  RFLAGS: 00010206
>>     RAX: 00000000000000e7  RBX: ffffffff81648dc9  RCX: 0000000000000000
>>
>> *******************************************3rd
>> BT**********************************************:
>>
>> PID: 1003121  TASK: ffff880036b58000  CPU: 1   COMMAND: "httpd"
>>  #0 [ffff880237a459c8] machine_kexec at ffffffff8105249b
>>  #1 [ffff880237a45a28] crash_kexec at ffffffff81103532
>>  #2 [ffff880237a45af8] panic at ffffffff816329b0
>>  #3 [ffff880237a45b78] watchdog_overflow_callback at ffffffff8112cee2
>>  #4 [ffff880237a45b88] __perf_event_overflow at ffffffff81171c11
>>  #5 [ffff880237a45c00] perf_event_overflow at ffffffff811726e4
>>  #6 [ffff880237a45c10] intel_pmu_handle_irq at ffffffff81032e98
>>  #7 [ffff880237a45e60] perf_event_nmi_handler at ffffffff8164206b
>>  #8 [ffff880237a45e80] nmi_handle at ffffffff816417b9
>>  #9 [ffff880237a45ec8] do_nmi at ffffffff816418d0
>> #10 [ffff880237a45ef0] end_repeat_nmi at ffffffff81640b93
>>     [exception RIP: _raw_spin_lock+58]
>>     RIP: ffffffff8163ff7a  RSP: ffff88003e16fa28  RFLAGS: 00000006
>>     RAX: 00000000000048f6  RBX: ffff8803edbab870  RCX: 0000000000006120
>>     RDX: 0000000000006362  RSI: 0000000000006362  RDI: ffff8803edbab898
>>     RBP: ffff88003e16fa28   R8: 0000000000000000   R9: 0000000002d98000
>>     R10: 0000000000002295  R11: ffffea0010d1f080  R12: 0000000000000000
>>     R13: ffff8803edbab870  R14: 0000000000000000  R15: ffff8803edbab898
>>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>> --- <NMI exception stack> ---
>> #11 [ffff88003e16fa28] _raw_spin_lock at ffffffff8163ff7a
>> #12 [ffff88003e16fa30] res_counter_uncharge_until at ffffffff81114df9
>> #13 [ffff88003e16fa78] res_counter_uncharge at ffffffff81114e73
>> #14 [ffff88003e16fa88] __mem_cgroup_uncharge_common at ffffffff811e9e7c
>> #15 [ffff88003e16fac8] mem_cgroup_uncharge_page at ffffffff811ee99a
>> #16 [ffff88003e16fad8] page_remove_rmap at ffffffff811b9ec9
>> #17 [ffff88003e16fb10] unmap_page_range at ffffffff811ab580
>> #18 [ffff88003e16fbf8] unmap_single_vma at ffffffff811aba11
>> #19 [ffff88003e16fc30] unmap_vmas at ffffffff811ace79
>> #20 [ffff88003e16fc68] exit_mmap at ffffffff811b663c
>> #21 [ffff88003e16fd18] mmput at ffffffff8107853b
>> #22 [ffff88003e16fd38] flush_old_exec at ffffffff81202547
>> #23 [ffff88003e16fd88] load_elf_binary at ffffffff8125883c
>> #24 [ffff88003e16fe58] search_binary_handler at ffffffff81201c25
>> #25 [ffff88003e16fea0] do_execve_common at ffffffff812032b7
>> #26 [ffff88003e16ff30] sys_execve at ffffffff81203619
>> #27 [ffff88003e16ff50] stub_execve at ffffffff81649369
>>     RIP: 00007f54e8341287  RSP: 00007fffcd0d22e8  RFLAGS: 00000297
>>     RAX: 000000000000003b  RBX: 0000000002d8b2a0  RCX: ffffffffffffffff
>>     RDX: 0000000002d8a810  RSI: 0000000002db4128  RDI: 00007f54e605cbf1
>>     RBP: 00007fffcd0d23a0   R8: 0000000000000001   R9: 0000000000000000
>>     R10: 00007fffcd0d2050  R11: 0000000000000297  R12: 0000000002d8a810
>>     R13: 0000000002db3a50  R14: 0000000002da8440  R15: 0000000000000000
>>     ORIG_RAX: 000000000000003b  CS: 0033  SS: 002b



-- 
Best regards,
Anatoly Stepanov | Kernel Developer
Skype: digitolman

CloudLinux.com  |  KernelCare.com  |  KuberDock.com

helpdesk.cloudlinux.com: 24/7 Free, exceptionally good support
Follow twitter.com/CloudLinuxOS for technical updates


More information about the Devel mailing list