[Users] Need help with hanging servers

Brian Moon brian at moonspot.net
Tue Jul 6 11:33:36 EDT 2010


Hi,

Been lurking on the list for a bit before I posted. We are relatively 
new and light OpenVZ users. We have three physical boxes that use 
OpenVZ. One is the server that is home to our developers' environment. 
Each developer has his own container. We have the occasional container 
stop responding due to too many resources used, but the entire server is 
fine. That is almost always the devs fault.

The other two installs we have are in production. They are sort of 
miscellaneous installation boxes. Things like cacti, nagios, misc web 
apps (web mail, etc.) as well as having containers for custom outgoing 
SMTP servers and running Gearman workers written in PHP on a dedicated 
container.

The management of OpenVZ is great. We love it. We just have one problem. 
On no regular schedule, the two production servers will hang. And it is 
a weird hang. They still respond to ping. And TCP connnections answer 
(connect) but don't respond. So, our monitoring hangs for a while 
waiting on an answer. Likewise our load balancers don't see them as down 
for a while after they are not responding. It is just weird. I am hoping 
that is some clue for someone. There is nothing in syslog on the host 
server or any containers. There is nothing on the console. It sounds 
like a resource issue. We have tried moving containers around, leaving 
some off for a while, and other stuff to find the offending container. 
But, nothing has worked. One or the other locks up every 5-6 days. Not 
on a schedule like it is a particular cron job causing the problem.

I am sure it is something we have done. We have allocated something 
wrong most likely and just need to be slapped one good time and told NO! 
But, I don't know where to look. I will jump into the IRC channel too in 
case someone is willing to help me and wants some real time data.

Thanks in advance for any help.

System information below. If there is more information that may help 
solve this problem, let me know what to look for.

# uname -a
Linux atl-vz1 2.6.18-028stab056 #1 SMP Tue Jun 30 07:50:32 EDT 2009 
x86_64 Intel(R) Xeon(R) CPU E5420 @ 2.50GHz GenuineIntel GNU/Linux

*  sys-kernel/openvz-sources
       Latest version installed: 2.6.27.5.3

System Information
	Manufacturer: Dell Inc.
	Product Name: PowerEdge 2950

# free
              total       used       free     shared    buffers     cached
Mem:      32872312   26336688    6535624          0         12   20952484
-/+ buffers/cache:    5384192   27488120
Swap:      8388656          0    8388656

# vzlist -o ctid,kmemsize,kmemsize.l -s kmemsize
       CTID   KMEMSIZE KMEMSIZE.L
        119    2025130  115710537
        116    2649072  231421075
        118    3145806   28927633
        111    3518587  115710537
        112    8613133   57855268
        121    8779664   57855268
        120   10341711  115710537
        122   10931070  231421075
        117   11024345  231421075
        113   22290970  231421075


-- 

Brian.
--------
http://brian.moonspot.net/


More information about the Users mailing list