[Users] Kernel panic on restore

Andrew Vagin avagin at parallels.com
Wed Mar 26 11:30:17 PDT 2014


Hello Roman,

Could you file a bug to bugzilla.openvz.org and assign it to me?

Thanks.

On Wed, Mar 26, 2014 at 05:35:55PM +0100, Roman Haefeli wrote:
> Hi all
> 
> I happened to be able to crash one hostnode of our testing cluster when
> restoring a CT.
> 
> Hostnodes:
> * 3 hostnodes running Debian 7 amd64 with OpenVZ kernel
> * Kernel:  042stab085.20
> * VE_ROOT / VE_PRIVATE is on an NFS mount shared by nodes
> 
> Test-CT:
> * Debian 7 from self-made template
> * amd64
> * ploop
> * runs mysql server and apache2 web server
> * runs scripts to cause load on mysql and web server
> 
> For testing purposes, I was online-migrating the test-CT between nodes
> once every 30 seconds. This went fine for a while, but after a few
> cycles (~20) one of the hostnodes crashed when trying to restore the CT.
> 
> This issue is most likely not specific to the kernel version. I got
> similar crashes with older versions as well, but was too lazy to report
> them. 
> 
> I'm aware that migrating a CT every 30 seconds might be considered
> extreme, though we experienced similar crashes on production systems at
> the time of online migration and on those we migrate every few weeks at
> most. Before using online migration on production again, I'd like to
> verify that the most extreme situation I can think of is handled
> gracefully by the kernel.
> 
> Here is the part of the syslog I was able to catch at the time of the
> crash, let me know if further information is needed: 
> 
> Mar 26 16:17:05 virtuetest3 kernel: [ 1000.279251]  ploop46524: p1
> Mar 26 16:17:05 virtuetest3 kernel: [ 1000.289409]  ploop46524: p1
> Mar 26 16:17:05 virtuetest3 kernel: [ 1000.313031] EXT4-fs (ploop46524p1): mounted filesystem with ordered data mode. Opts: 
> Mar 26 16:17:05 virtuetest3 kernel: [ 1000.314840] EXT4-fs (ploop46524p1): loaded balloon from 12 (0 blocks)
> Mar 26 16:17:05 virtuetest3 kernel: [ 1000.383837] lo: Dropping TSO features since no CSUM feature.
> Mar 26 16:17:05 virtuetest3 kernel: [ 1000.384787] CT: 54: started
> Mar 26 16:17:05 virtuetest3 kernel: [ 1000.399195] device veth54.0 entered promiscuous mode
> Mar 26 16:17:05 virtuetest3 kernel: [ 1000.399286] br_206: port 2(veth54.0) entering forwarding state
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660051] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660232] IP: [<ffffffff814adcfe>] inet_csk_reqsk_queue_prune+0x29e/0x2c0
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660372] PGD 0 
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660419] Oops: 0000 [#1] SMP 
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660498] last sysfs file: /sys/devices/virtual/block/ploop46524/removable
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660616] CPU 0 
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.660657] Modules linked in: vzethdev vznetdev pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vzrst nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 vzcpt nf_conntrack vziolimit vzmon xt_length xt_hl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables nfs fscache vzdquota vzdev vzevent ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi fuse nfsd nfs_acl auth_rpcgss lockd sunrpc ipv6 bridge 8021q garp stp llc snd_pcsp radeon iTCO_wdt iTCO_vendor_support snd_pcm ttm snd_page_alloc drm_kms_helper snd_timer lpc_ich i5000_edac drm ioatdma mfd_core edac_core snd i2c_algo_bit i5k_amb i2c_core soundcore serio_raw dca shpchp ext4 jbd2 mbcache sg sd_mod crc_t10dif ata_generic pata_acpi mptsas mptscsih bnx2 ata_piix mptbase scsi_transport_sas [last unloaded: ploop]
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] 
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Pid: 0, comm: swapper veid: 0 Not tainted 2.6.32-openvz-042stab085.20-amd64 #1 042stab085_20 IBM IBM eServer BladeCenter HS21 -[7995L3G]-/Server Blade
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RIP: 0010:[<ffffffff814adcfe>]  [<ffffffff814adcfe>] inet_csk_reqsk_queue_prune+0x29e/0x2c0
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RSP: 0018:ffff880028203d50  EFLAGS: 00010202
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RAX: 0000000000000000 RBX: 00000001000ab4dc RCX: 0000000000000000
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RDX: ffff88034c05a500 RSI: ffff880362581c80 RDI: ffff880366f2b080
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RBP: ffff880028203dc0 R08: ffff88002821c320 R09: 0000000000000000
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880366f2b080
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] R13: 000000000001d4c0 R14: ffff880366f2b3c0 R15: ffff880362581c80
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] CR2: 0000000000000018 CR3: 0000000349579000 CR4: 00000000000007f0
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Process swapper (pid: 0, veid: 0, threadinfo ffffffff81a00000, task ffffffff81a8d020)
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Stack:
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  0000840bffff23fa ffff88034c05a500 00000000000000c8 0000000181c0f7a8
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] <d> 0000009d0000002f 00000000000003e8 ffff88034c05a000 000000058146c918
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] <d> ffffc900035d9000 ffff880366f2b080 ffff880366f2b0c8 ffffffff81aaa180
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Call Trace:
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  <IRQ> 
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff814c2757>] tcp_keepalive_timer+0x187/0x2e0
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81089b7c>] run_timer_softirq+0x1bc/0x380
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff814c25d0>] ? tcp_keepalive_timer+0x0/0x2e0
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8107f3c3>] __do_softirq+0x103/0x260
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8100c44c>] call_softirq+0x1c/0x30
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81010195>] do_softirq+0x65/0xa0
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8107f1ed>] irq_exit+0xcd/0xd0
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81539515>] do_IRQ+0x75/0xf0
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8100ba93>] ret_from_intr+0x0/0x11
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  <EOI> 
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81016ce7>] ? mwait_idle+0x77/0xd0
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81535a9a>] ? atomic_notifier_call_chain+0x1a/0x20
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8100a013>] cpu_idle+0xb3/0x110
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81514db5>] rest_init+0x85/0x90
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81c31f80>] start_kernel+0x412/0x41e
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81c3133a>] x86_64_start_reservations+0x125/0x129
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81c31453>] x86_64_start_kernel+0x115/0x124
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Code: ff 0f 1f 40 00 4c 89 fa e9 50 fe ff ff 41 f6 47 49 10 74 09 31 c9 e9 6f ff ff ff 66 90 49 8b 47 20 4c 89 fe 48 89 55 98 4c 89 e7 <ff> 50 18 85 c0 48 8b 55 98 0f 84 68 fe ff ff 41 f6 47 49 10 0f 
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] RIP  [<ffffffff814adcfe>] inet_csk_reqsk_queue_prune+0x29e/0x2c0
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  RSP <ffff880028203d50>
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] CR2: 0000000000000018
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Tainting kernel with flag 0x7
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Pid: 0, comm: swapper veid: 0 Not tainted 2.6.32-openvz-042stab085.20-amd64 #1
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003] Call Trace:
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  <IRQ>  [<ffffffff81075e65>] ? add_taint+0x35/0x70
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff815339b4>] ? oops_end+0x54/0x100
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8104af5b>] ? no_context+0xfb/0x260
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8104b1d5>] ? __bad_area_nosemaphore+0x115/0x1e0
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffffa034fff8>] ? br_nf_pre_routing_finish+0x238/0x350 [bridge]
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8104b2b3>] ? bad_area_nosemaphore+0x13/0x20
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8104ba02>] ? __do_page_fault+0x322/0x490
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffffa03505ba>] ? br_nf_pre_routing+0x4aa/0x7e0 [bridge]
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81497ab9>] ? nf_iterate+0x69/0xb0
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffffa0349e00>] ? br_handle_frame_finish+0x0/0x320 [bridge]
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81497c76>] ? nf_hook_slow+0x76/0x120
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffffa0349e00>] ? br_handle_frame_finish+0x0/0x320 [bridge]
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff8153597e>] ? do_page_fault+0x3e/0xa0
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81532d05>] ? page_fault+0x25/0x30
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff814adcfe>] ? inet_csk_reqsk_queue_prune+0x29e/0x2c0
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff814c2757>] ? tcp_keepalive_timer+0x187/0x2e0
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff81089b7c>] ? run_timer_softirq+0x1bc/0x380
> Mar 26 16:17:07 virtuetest3 kernel: [ 1001.661003]  [<ffffffff814c25d0>] ? tcp_keepalive_tim
> 
> 
> _______________________________________________
> Users mailing list
> Users at openvz.org
> https://lists.openvz.org/mailman/listinfo/users


More information about the Users mailing list