[Users] Fwd: Need help with hanging servers

Brian Moon brian at moonspot.net
Tue Jul 20 09:40:54 EDT 2010


Just a follow up. We upgraded to 2.6.27-openvz-kuindzhi.1 on Gentoo and 
have had no crashes since.

Full disclosure for the archives in case someone else has problems.

System Information
         Manufacturer: Dell Inc.
         Product Name: PowerEdge 2950

01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 
(rev 04)


-------- Original Message --------
Subject: Need help with hanging servers
Date: Tue, 06 Jul 2010 10:33:36 -0500
From: Brian Moon <brian at moonspot.net>
To: users at openvz.org <users at openvz.org>

Hi,

Been lurking on the list for a bit before I posted. We are relatively
new and light OpenVZ users. We have three physical boxes that use
OpenVZ. One is the server that is home to our developers' environment.
Each developer has his own container. We have the occasional container
stop responding due to too many resources used, but the entire server is
fine. That is almost always the devs fault.

The other two installs we have are in production. They are sort of
miscellaneous installation boxes. Things like cacti, nagios, misc web
apps (web mail, etc.) as well as having containers for custom outgoing
SMTP servers and running Gearman workers written in PHP on a dedicated
container.

The management of OpenVZ is great. We love it. We just have one problem.
On no regular schedule, the two production servers will hang. And it is
a weird hang. They still respond to ping. And TCP connnections answer
(connect) but don't respond. So, our monitoring hangs for a while
waiting on an answer. Likewise our load balancers don't see them as down
for a while after they are not responding. It is just weird. I am hoping
that is some clue for someone. There is nothing in syslog on the host
server or any containers. There is nothing on the console. It sounds
like a resource issue. We have tried moving containers around, leaving
some off for a while, and other stuff to find the offending container.
But, nothing has worked. One or the other locks up every 5-6 days. Not
on a schedule like it is a particular cron job causing the problem.

I am sure it is something we have done. We have allocated something
wrong most likely and just need to be slapped one good time and told NO!
But, I don't know where to look. I will jump into the IRC channel too in
case someone is willing to help me and wants some real time data.

Thanks in advance for any help.

System information below. If there is more information that may help
solve this problem, let me know what to look for.

# uname -a
Linux atl-vz1 2.6.18-028stab056 #1 SMP Tue Jun 30 07:50:32 EDT 2009
x86_64 Intel(R) Xeon(R) CPU E5420 @ 2.50GHz GenuineIntel GNU/Linux

*  sys-kernel/openvz-sources
       Latest version installed: 2.6.27.5.3

System Information
	Manufacturer: Dell Inc.
	Product Name: PowerEdge 2950

# free
              total       used       free     shared    buffers     cached
Mem:      32872312   26336688    6535624          0         12   20952484
-/+ buffers/cache:    5384192   27488120
Swap:      8388656          0    8388656

# vzlist -o ctid,kmemsize,kmemsize.l -s kmemsize
       CTID   KMEMSIZE KMEMSIZE.L
        119    2025130  115710537
        116    2649072  231421075
        118    3145806   28927633
        111    3518587  115710537
        112    8613133   57855268
        121    8779664   57855268
        120   10341711  115710537
        122   10931070  231421075
        117   11024345  231421075
        113   22290970  231421075


-- 

Brian.
--------
http://brian.moonspot.net/


More information about the Users mailing list