[Users] Need help with hanging servers

John Drescher drescherjm at gmail.com
Tue Jul 6 13:59:16 EDT 2010


On Tue, Jul 6, 2010 at 1:43 PM, Scott Dowdle <dowdle at montanalinux.org> wrote:
> Greetings,
>
>
> ----- Original Message -----
>> Been lurking on the list for a bit before I posted. We are relatively
>> new and light OpenVZ users. We have three physical boxes that use
>> OpenVZ. One is the server that is home to our developers' environment.
>> Each developer has his own container. We have the occasional container
>> stop responding due to too many resources used, but the entire server
>> is fine. That is almost always the devs fault.
>>
>> The other two installs we have are in production. They are sort of
>> miscellaneous installation boxes. Things like cacti, nagios, misc web
>> apps (web mail, etc.) as well as having containers for custom outgoing
>> SMTP servers and running Gearman workers written in PHP on a dedicated
>> container.
>>
>> The management of OpenVZ is great. We love it. We just have one problem.
>> On no regular schedule, the two production servers will hang. And it is
>> a weird hang. They still respond to ping. And TCP connnections answer
>> (connect) but don't respond. So, our monitoring hangs for a while
>> waiting on an answer. Likewise our load balancers don't see them as down
>> for a while after they are not responding. It is just weird. I am hoping
>> that is some clue for someone. There is nothing in syslog on the host
>> server or any containers. There is nothing on the console. It sounds
>> like a resource issue. We have tried moving containers around, leaving
>> some off for a while, and other stuff to find the offending container.
>> But, nothing has worked. One or the other locks up every 5-6 days. Not
>> on a schedule like it is a particular cron job causing the problem.
>>
>> I am sure it is something we have done. We have allocated something
>> wrong most likely and just need to be slapped one good time and told NO!
>> But, I don't know where to look. I will jump into the IRC channel too in
>> case someone is willing to help me and wants some real time data.
>>
>> Thanks in advance for any help.
>>
>> System information below. If there is more information that may help
>> solve this problem, let me know what to look for.
>>
>> # uname -a
>> Linux atl-vz1 2.6.18-028stab056 #1 SMP Tue Jun 30 07:50:32 EDT 2009
>> x86_64 Intel(R) Xeon(R) CPU E5420 @ 2.50GHz GenuineIntel GNU/Linux
>>
>> * sys-kernel/openvz-sources
>> Latest version installed: 2.6.27.5.3
>>
>> System Information
>> Manufacturer: Dell Inc.
>> Product Name: PowerEdge 2950
>>
>> # free
>> total used free shared buffers cached
>> Mem: 32872312 26336688 6535624 0 12 20952484
>> -/+ buffers/cache: 5384192 27488120
>> Swap: 8388656 0 8388656
>>
>> # vzlist -o ctid,kmemsize,kmemsize.l -s kmemsize
>> CTID KMEMSIZE KMEMSIZE.L
>> 119 2025130 115710537
>> 116 2649072 231421075
>> 118 3145806 28927633
>> 111 3518587 115710537
>> 112 8613133 57855268
>> 121 8779664 57855268
>> 120 10341711 115710537
>> 122 10931070 231421075
>> 117 11024345 231421075
>> 113 22290970 231421075
>
> Like... you didn't mention if you had any failcnts in the containers.   Do you?
>
> You probably already know this but it doesn't hurt to mention, 2.6.27.x is not a "stable" OpenVZ kernel branch.
>

I guess it depends on your usage. I used openvz-2.6.27 on gentoo for
over two years on production servers with very little issues. Now I am
using 2.6.32 on many of these.

John



More information about the Users mailing list