[Users] Kernel panic on restore

Roman Haefeli reduzent at gmail.com
Wed Mar 26 09:35:55 PDT 2014


Hi all

I happened to be able to crash one hostnode of our testing cluster when
restoring a CT.

Hostnodes:
* 3 hostnodes running Debian 7 amd64 with OpenVZ kernel
* Kernel:  042stab085.20
* VE_ROOT / VE_PRIVATE is on an NFS mount shared by nodes

Test-CT:
* Debian 7 from self-made template
* amd64
* ploop
* runs mysql server and apache2 web server
* runs scripts to cause load on mysql and web server

For testing purposes, I was online-migrating the test-CT between nodes
once every 30 seconds. This went fine for a while, but after a few
cycles (~20) one of the hostnodes crashed when trying to restore the CT.

This issue is most likely not specific to the kernel version. I got
similar crashes with older versions as well, but was too lazy to report
them. 

I'm aware that migrating a CT every 30 seconds might be considered
extreme, though we experienced similar crashes on production systems at
the time of online migration and on those we migrate every few weeks at
most. Before using online migration on production again, I'd like to
verify that the most extreme situation I can think of is handled
gracefully by the kernel.

Here is the part of the syslog I was able to catch at the time of the
crash, let me know if further information is needed: 

Mar 26 16:17:05 virtuetest3 kernel: [ 1000.279251]  ploop46524: p1
Mar 26 16:17:05 virtuetest3 kernel: [ 1000.289409]  ploop46524: p1
Mar 26 16:17:05 virtuetest3 kernel: [ 1000.313031] EXT4-fs (ploop46524p1): mounted filesystem with ordered data mode. Opts: 
Mar 26 16:17:05 virtuetest3 kernel: [ 1000.314840] EXT4-fs (ploop46524p1): loaded balloon from 12 (0 blocks)
Mar 26 16:17:05 virtuetest3 kernel: [ 1000.383837] lo: Dropping TSO features since no CSUM feature.
Mar 26 16:17:05 virtuetest3 kernel: [ 1000.384787] CT: 54: started
Mar 26 16:17:05 virtuetest3 kernel: [ 1000.399195] device veth54.0 entered promiscuous mode
Mar 26 16:17:05 virtuetest3 kernel: [ 1000.399286] br_206: port 2(veth54.0) entering forwarding state
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660051] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660232] IP: [<ffffffff814adcfe>] inet_csk_reqsk_queue_prune+0x29e/0x2c0
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660372] PGD 0 
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660419] Oops: 0000 [#1] SMP 
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660498] last sysfs file: /sys/devices/virtual/block/ploop46524/removable
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660616] CPU 0 
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660657] Modules linked in: vzethdev vznetdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vzcpt nf_conntrack vziolimit vzmon xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables nfs fscache vzdquota vzdev vzevent ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse nfsd nfs_acl auth_rpcgss lockd sunrpc ipv6 bridge 8021q garp stp llc snd_pcsp radeon iTCO_wdt iTCO_vendor_support snd_pcm ttm snd_page_alloc drm_kms_helper snd_timer lpc_ich i5000_edac drm ioatdma mfd_core edac_core snd i2c_algo_bit i5k_amb i2c_core soundcore serio_raw dca shpchp ext4 jbd2 mbcache sg sd_mod crc_t10dif ata_generic pata_acpi mptsas mptscsih bnx2 ata_piix mptbase scsi_transport_sas [last unloaded: ploop]
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] 
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Pid: 0, comm: swapper veid: 0 Not tainted 2.6.32-openvz-042stab085.20-amd64 #1 042stab085_20 IBM IBM eServer BladeCenter HS21 -[7995L3G]-/Server Blade
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RIP: 0010:[<ffffffff814adcfe>]  [<ffffffff814adcfe>] inet_csk_reqsk_queue_prune+0x29e/0x2c0
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RSP: 0018:ffff880028203d50  EFLAGS: 00010202
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RAX: 0000000000000000 RBX: 00000001000ab4dc RCX: 0000000000000000
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RDX: ffff88034c05a500 RSI: ffff880362581c80 RDI: ffff880366f2b080
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RBP: ffff880028203dc0 R08: ffff88002821c320 R09: 0000000000000000
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880366f2b080
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] R13: 000000000001d4c0 R14: ffff880366f2b3c0 R15: ffff880362581c80
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] CR2: 0000000000000018 CR3: 0000000349579000 CR4: 00000000000007f0
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Process swapper (pid: 0, veid: 0, threadinfo ffffffff81a00000, task ffffffff81a8d020)
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Stack:
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  0000840bffff23fa ffff88034c05a500 00000000000000c8 0000000181c0f7a8
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] <d> 0000009d0000002f 00000000000003e8 ffff88034c05a000 000000058146c918
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] <d> ffffc900035d9000 ffff880366f2b080 ffff880366f2b0c8 ffffffff81aaa180
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Call Trace:
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  <IRQ> 
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff814c2757>] tcp_keepalive_timer+0x187/0x2e0
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81089b7c>] run_timer_softirq+0x1bc/0x380
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff814c25d0>] ? tcp_keepalive_timer+0x0/0x2e0
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8107f3c3>] __do_softirq+0x103/0x260
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8100c44c>] call_softirq+0x1c/0x30
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81010195>] do_softirq+0x65/0xa0
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8107f1ed>] irq_exit+0xcd/0xd0
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81539515>] do_IRQ+0x75/0xf0
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8100ba93>] ret_from_intr+0x0/0x11
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  <EOI> 
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81016ce7>] ? mwait_idle+0x77/0xd0
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81535a9a>] ? atomic_notifier_call_chain+0x1a/0x20
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8100a013>] cpu_idle+0xb3/0x110
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81514db5>] rest_init+0x85/0x90
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81c31f80>] start_kernel+0x412/0x41e
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81c3133a>] x86_64_start_reservations+0x125/0x129
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81c31453>] x86_64_start_kernel+0x115/0x124
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Code: ff 0f 1f 40 00 4c 89 fa e9 50 fe ff ff 41 f6 47 49 10 74 09 31 c9 e9 6f ff ff ff 66 90 49 8b 47 20 4c 89 fe 48 89 55 98 4c 89 e7 <ff> 50 18 85 c0 48 8b 55 98 0f 84 68 fe ff ff 41 f6 47 49 10 0f 
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RIP  [<ffffffff814adcfe>] inet_csk_reqsk_queue_prune+0x29e/0x2c0
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  RSP <ffff880028203d50>
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] CR2: 0000000000000018
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Tainting kernel with flag 0x7
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Pid: 0, comm: swapper veid: 0 Not tainted 2.6.32-openvz-042stab085.20-amd64 #1
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Call Trace:
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  <IRQ>  [<ffffffff81075e65>] ? add_taint+0x35/0x70
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff815339b4>] ? oops_end+0x54/0x100
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8104af5b>] ? no_context+0xfb/0x260
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8104b1d5>] ? __bad_area_nosemaphore+0x115/0x1e0
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffffa034fff8>] ? br_nf_pre_routing_finish+0x238/0x350 [bridge]
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8104b2b3>] ? bad_area_nosemaphore+0x13/0x20
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8104ba02>] ? __do_page_fault+0x322/0x490
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffffa03505ba>] ? br_nf_pre_routing+0x4aa/0x7e0 [bridge]
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81497ab9>] ? nf_iterate+0x69/0xb0
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffffa0349e00>] ? br_handle_frame_finish+0x0/0x320 [bridge]
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81497c76>] ? nf_hook_slow+0x76/0x120
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffffa0349e00>] ? br_handle_frame_finish+0x0/0x320 [bridge]
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8153597e>] ? do_page_fault+0x3e/0xa0
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81532d05>] ? page_fault+0x25/0x30
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff814adcfe>] ? inet_csk_reqsk_queue_prune+0x29e/0x2c0
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff814c2757>] ? tcp_keepalive_timer+0x187/0x2e0
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81089b7c>] ? run_timer_softirq+0x1bc/0x380
Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff814c25d0>] ? tcp_keepalive_tim




More information about the Users mailing list