[Users] Need help with hanging servers
Brian Moon
brian at moonspot.net
Tue Jul 6 11:33:36 EDT 2010
Hi,
Been lurking on the list for a bit before I posted. We are relatively
new and light OpenVZ users. We have three physical boxes that use
OpenVZ. One is the server that is home to our developers' environment.
Each developer has his own container. We have the occasional container
stop responding due to too many resources used, but the entire server is
fine. That is almost always the devs fault.
The other two installs we have are in production. They are sort of
miscellaneous installation boxes. Things like cacti, nagios, misc web
apps (web mail, etc.) as well as having containers for custom outgoing
SMTP servers and running Gearman workers written in PHP on a dedicated
container.
The management of OpenVZ is great. We love it. We just have one problem.
On no regular schedule, the two production servers will hang. And it is
a weird hang. They still respond to ping. And TCP connnections answer
(connect) but don't respond. So, our monitoring hangs for a while
waiting on an answer. Likewise our load balancers don't see them as down
for a while after they are not responding. It is just weird. I am hoping
that is some clue for someone. There is nothing in syslog on the host
server or any containers. There is nothing on the console. It sounds
like a resource issue. We have tried moving containers around, leaving
some off for a while, and other stuff to find the offending container.
But, nothing has worked. One or the other locks up every 5-6 days. Not
on a schedule like it is a particular cron job causing the problem.
I am sure it is something we have done. We have allocated something
wrong most likely and just need to be slapped one good time and told NO!
But, I don't know where to look. I will jump into the IRC channel too in
case someone is willing to help me and wants some real time data.
Thanks in advance for any help.
System information below. If there is more information that may help
solve this problem, let me know what to look for.
# uname -a
Linux atl-vz1 2.6.18-028stab056 #1 SMP Tue Jun 30 07:50:32 EDT 2009
x86_64 Intel(R) Xeon(R) CPU E5420 @ 2.50GHz GenuineIntel GNU/Linux
* sys-kernel/openvz-sources
Latest version installed: 2.6.27.5.3
System Information
Manufacturer: Dell Inc.
Product Name: PowerEdge 2950
# free
total used free shared buffers cached
Mem: 32872312 26336688 6535624 0 12 20952484
-/+ buffers/cache: 5384192 27488120
Swap: 8388656 0 8388656
# vzlist -o ctid,kmemsize,kmemsize.l -s kmemsize
CTID KMEMSIZE KMEMSIZE.L
119 2025130 115710537
116 2649072 231421075
118 3145806 28927633
111 3518587 115710537
112 8613133 57855268
121 8779664 57855268
120 10341711 115710537
122 10931070 231421075
117 11024345 231421075
113 22290970 231421075
--
Brian.
--------
http://brian.moonspot.net/
More information about the Users
mailing list