[Users] Container networking broken after upgrade to 2.6.18-308.8.2.el5.028stab101.1, stange kernel errors. CPT ERR

P J pauljflists at gmail.com
Wed Jan 30 11:55:26 EST 2013


Hey Guys,

I've spent days and countless hours trying to figure out what is going on
here and I'm totally out of ideas, I've tried posting on the openvz forum
but for some reason the post is not approved, so now I'm reaching out to
you all.

I've even emailed OpenVZ for paid commercial support, I'm happy to pay for
support for this issue as we have been using OpenVZ for many years problem
free on many servers.

Here is the issue:

One of oru CentOS 5.9 x86_64 servers, last weekend I upgraded the kernel
from 2.6.18-194.8.1.el5.028stab070.4 to the latest
2.6.18-308.8.2.el5.028stab101.1. (we use ksplice so I don't have to reboot
the HN too often)

Installed VZ packages:

vzctl-core-4.1.2-1
ovzkernel-2.6.18-308.8.2.el5.028stab101.1
vzctl-4.1.2-1
ovzkernel-2.6.18-194.8.1.el5.028stab070.4
ovzkernel-devel-2.6.18-308.8.2.el5.028stab101.1
vzquota-3.1-1
ovzkernel-2.6.18-308.el5.028stab099.3
ovzkernel-2.6.18-238.9.1.el5.028stab089.1

Upon booting the new 2.6.18-308.8.2.el5.028stab101.1 kernel, I started
seeing strange kernel errors when starting the containers:


"Jan 26 20:46:06 vz02 kernel: CT: 103: started
Jan 26 20:46:08 vz02 kernel: CPT ERR: ffff81031e46a000,103 :NLMERR: -22
Jan 26 20:46:08 vz02 last message repeated 8 times
Jan 26 20:46:12 vz02 kernel: CT: 104: started
Jan 26 20:46:14 vz02 kernel: CPT ERR: ffff81031e46a000,104
:open_listening_socket: sock_create_kern: -97
Jan 26 20:46:14 vz02 kernel: CPT ERR: ffff81031e46a000,104 :rst_sockets:
open_listening_socket: -97
Jan 26 20:46:14 vz02 kernel: CPT ERR: ffff81031e46a000,104 :rst_sockets: -97
Jan 26 20:46:14 vz02 kernel: CT: 104: stopped
Jan 26 20:46:15 vz02 kernel: CT: 104: started
Jan 26 20:46:17 vz02 kernel: CT: 105: started
Jan 26 20:46:19 vz02 kernel: CPT ERR: ffff81032156a000,105 :NLMERR: -22
Jan 26 20:46:19 vz02 last message repeated 8 times
Jan 26 20:46:19 vz02 kernel: CPT ERR: ffff81032156a000,105
:open_listening_socket: sock_create_kern: -97
Jan 26 20:46:19 vz02 kernel: CPT ERR: ffff81032156a000,105 :rst_sockets:
open_listening_socket: -97
Jan 26 20:46:19 vz02 kernel: CPT ERR: ffff81032156a000,105 :rst_sockets: -97
"

--snip-- this goes on and on...

The containers did start up, but networking was not working. I could not
ping them, they could not ping out.
Their interfaces were up, most of them use venet.

So I did a vzctl restart on each container, they threw out the same error
messages, but networking started working again.

What makes even less sense is, we have other identical servers, same
hardware, same version of CentOS, same VZ kernel - no issues. What am I
missing here?

Here is a copy of our vz.conf, nothing fancy:

"
VIRTUOZZO=yes
LOCKDIR=/vz/lock
DUMPDIR=/vz/dump
VE0CPUUNITS=1000

## Logging parameters
LOGGING=yes
LOGFILE=/var/log/vzctl.log
LOG_LEVEL=0
VERBOSE=0

## Disk quota parameters
DISK_QUOTA=yes
VZFASTBOOT=no

# Disable module loading. If set, vz initscript do not load any modules.
#MODULES_DISABLED=yes

# The name of the device whose IP address will be used as source IP for CT.
# By default automatically assigned.
#VE_ROUTE_SRC_DEV="eth0"

# Controls which interfaces to send ARP requests and modify APR tables on.
NEIGHBOUR_DEVS=all
ERROR_ON_ARPFAIL="no"


## Template parameters
TEMPLATE=/vz/template

## Defaults for containers
VE_ROOT=/vz/root/$VEID
VE_PRIVATE=/vz/private/$VEID
CONFIGFILE="vps.basic"
DEF_OSTEMPLATE="fedora-core-4"

## Load vzwdog module
VZWDOG="no"

## IPv4 iptables kernel modules
#IPTABLES="ipt_REJECT ipt_tos ipt_limit ipt_multiport iptable_filter
iptable_mangle ipt_TCPMSS ipt_tcpmss ipt_ttl ipt_length ipt_state
iptable_nat "

IPTABLES="iptable_filter iptable_mangle ipt_limit ipt_multiport ipt_tos
ipt_TOS ipt_REJECT ipt_TCPMSS ipt_tcpmss ipt_ttl ipt_LOG ipt_length
ip_conntrack ip_conntrack_ftp ip_conntrack_irc ipt_conntrack ipt_state
ipt_helper iptable_nat ip_nat_ftp ip_nat_irc ipt_REDIRECT ipt_MASQUERADE"

## Enable IPv6
IPV6="no"

## IPv6 ip6tables kernel modules
IP6TABLES="ip6_tables ip6table_filter ip6table_mangle ip6t_REJECT"
"

Here is a example container config.

"
ONBOOT="yes"

NUMPROC="5102:5102"
AVNUMPROC="2551:2551"
NUMTCPSOCK="5102:5102"
NUMOTHERSOCK="5102:5102"
VMGUARPAGES="262144:9223372036854775807"

# Secondary parameters
KMEMSIZE="209012940:229914234"
TCPSNDBUF="48773188:69670980"
TCPRCVBUF="48773188:69670980"
OTHERSOCKBUF="24386594:45284386"
DGRAMRCVBUF="24386594:24386594"
OOMGUARPAGES="151485:9223372036854775807"
PRIVVMPAGES="256000:262140"

# Auxiliary parameters
LOCKEDPAGES="10205:10205"
SHMPAGES="90891:90891"
PHYSPAGES="0:9223372036854775807"
NUMFILE="81632:81632"
NUMFLOCK="1000:1100"
NUMPTY="510:510"
NUMSIGINFO="1024:1024"
DCACHESIZE="45650516:47020032"

NUMIPTENT="3072:3072"
# Disk Resource Limits
DISKINODES="4560000:4800000"
DISKSPACE="39845888:41943040"

# Quota Resource Limits
QUOTATIME="0"
QUOTAUGIDLIMIT="3000"

# CPU Resource Limits
CPUUNITS="1000"
#RATE="eth0:1:6000"

# IPTables config
IPTABLES="ipt_REJECT ipt_tos ipt_limit ipt_multiport iptable_filter
iptable_mangle ipt_TCPMSS ipt_tcpmss ipt_ttl ipt_length ip_conntrack
ip_conntrack_ftp ipt_LOG ipt_conntrack ipt_helper ipt_state iptable_nat
ip_nat_ftp ipt_TOS ipt_REDIRECT"

# Default Devices
#DEVICES="c:10:229:rw c:10:200:rw "

IP_ADDRESS="1.2.3.4"
HOSTNAME="nscache01.xxx.com"
VE_ROOT="/vz/root/$VEID"
VE_PRIVATE="/vz/private/$VEID"
OSTEMPLATE="centos-5-x86_64"
ORIGIN_SAMPLE="512mb"
NAMESERVER="1.2.3.4"
NAME="nscache01-xxx"
"

Any ideas? Configuration issue? Kernel bug?

I also noticed a container that used to run OpenVPN no longer works, so
even though networking is now "working" something is still going on...

The one thing I did do before upgrading the kernel is I had to remove ploop
as there were some dependency issues.
We don't use any ploop based containers so I don't believe this should
affect us? We did the same on another server and had no issue...

If anyone wants to respond directly to me that provides paid support,
OpenVZ developers or anyone else I'm of course happy to pay, I don't expect
someone to spend hours troubleshooting something for free.

Or, of course if any other list readers have any ideas please let me know!
:)

Thanks in advance for your help.

-PJF
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/users/attachments/20130130/819b3eec/attachment.html>


More information about the Users mailing list