Hi Sirk,<div><br>Thanks for your reply. I&#39;m so pleased having found this mailing list after having tried the forum, which seem to have very little activity!</div><div><br></div><div>Ploop is a great idea technically, but I&#39;m a little concerned about the &quot; Warning: This is a new feature, not yet ready for production systems. Use with caution.&quot; on the OpenVZ Wiki page, so I&#39;m kinda waiting for the green-light that it&#39;s ready for production environments.</div>

<div><br></div><div>It did occur to me that disk-IO could be the cause of the problem, but iostat on the hardware node did not suggest any particular IO problems.  I still haven&#39;t found a way to see the IO activity within a container - iostat just comes up blank when it&#39;s run within a container.  Is there a way?  </div>

<div><br></div><div>We&#39;re not using any network storage with this server so that is not the reason.</div><div><br></div><div>The server has 4 SATA-3 drives, with the root partition being on one drive, the problem container alone on a second drive, and the remaining containers on a third.</div>

<div><br>Best,</div><div>Rene</div><div><br><div class="gmail_quote">On Tue, May 22, 2012 at 3:06 PM, Sirk Johannsen <span dir="ltr">&lt;<a href="mailto:s.johannsen@satzmedia.de" target="_blank">s.johannsen@satzmedia.de</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Rene,<br>

<br>

Since CPU and MEM are fine it&#39;s most likely to be Disk-IO.<br>

I have similar Problems with a Cluster Setup based on OpenVZ.<br>

The problem is that our Storage is way to slow.<br>

We have been accessing the storage via NFS and put all our CTs private<br>

areas on it.<br>

I noticed many times that one CT was doing a lot of disk IO and all<br>

other were suffering from that... that even lead to total system<br>

failures.<br>

This has been solved by converting everything to ploop. Since then our<br>

system is at least in a stable state.<br>

IO Performance is still an issue but does not bring our system down.<br>

<br>

You should give ploop a try :-) I am very happy with it.<br>

<br>

best regards,<br>

<br>

Sirk<br>

<br>

2012/5/21 Rene Dokbua &lt;<a href="mailto:openvz@dokbua.com">openvz@dokbua.com</a>&gt;:<br>

<div><div class="h5">&gt; Hello,<br>

&gt;<br>

&gt; I occasionally get this extreme load on one of our VPS servers. It is quite<br>

&gt; large, 4 full E31230 cores, 4 GB RAM and hosting ca. 400 websites +<br>

&gt; parked/addon/subdomains.<br>

&gt;<br>

&gt; The hardware node has 12 active VPS servers and most of the time things are<br>

&gt; chugging along just fine, something like this.<br>

&gt;<br>

&gt; 1401: 0.00 0.00 0.00 1/23 4561<br>

&gt; 1402: 0.02 0.05 0.05 1/57 16991<br>

&gt; 1404: 0.01 0.02 0.00 1/73 18863<br>

&gt; 1406: 0.07 0.13 0.06 1/39 31189<br>

&gt; 1407: 0.86 1.03 1.14 1/113 31460<br>

&gt; 1408: 0.17 0.17 0.18 1/79 32579<br>

&gt; 1409: 0.00 0.00 0.02 1/77 21784<br>

&gt; 1410: 0.01 0.02 0.00 1/60 7454<br>

&gt; 1413: 0.00 0.00 0.00 1/46 18579<br>

&gt; 1414: 0.00 0.00 0.00 1/41 23812<br>

&gt; 1415: 0.00 0.00 0.00 1/45 9831<br>

&gt; 1416: 0.05 0.02 0.00 1/59 11332<br>

&gt; 12 active<br>

&gt;<br>

&gt; The problem VPS is 1407. As you can see below it only uses a bit of the cpu<br>

&gt; and memory.<br>

&gt;<br>

&gt; top - 17:34:12 up 32 days, 12:21,  0 users,  load average: 0.78, 0.95, 1.09<br>

&gt; Tasks: 102 total,   4 running,  90 sleeping,   0 stopped,   8 zombie<br>

&gt; Cpu(s): 16.3%us,  2.9%sy,  0.4%ni, 78.5%id,  1.8%wa,  0.0%hi,  0.0%si,<br>

&gt;  0.1%st<br>

&gt; Mem:   4194304k total,  2550572k used,  1643732k free,        0k buffers<br>

&gt; Swap:  8388608k total,   105344k used,  8283264k free,  1793828k cached<br>

&gt;<br>

&gt; Also iostat and vmstat shows no particular io or swap activity.<br>

&gt;<br>

&gt; Now for the problem. Every once in a while the loadavg of this particular<br>

&gt; VPS shoots up to like crazy values, 30 or more and it becomes completely<br>

&gt; sluggish. The odd thing is load goes up for the VPS server, and starts<br>

&gt; spilling into other VPS serers on the same hardware node - but there are<br>

&gt; still no particular cpu/memory/io usage going on that I can se.  No<br>

&gt; particular network activity.   In this example load has fallen back to<br>

&gt; around 10 but it was much higher earlier.<br>

&gt;<br>

&gt;  16:19:44 up 32 days, 11:19,  3 users,  load average: 12.87, 19.11, 18.87<br>

&gt;<br>

&gt; 1401: 0.01 0.03 0.00 1/23 2876<br>

&gt; 1402: 0.00 0.11 0.13 1/57 15334<br>

&gt; 1404: 0.02 0.20 0.16 1/77 14918<br>

&gt; 1406: 0.01 0.13 0.10 1/39 29595<br>

&gt; 1407: 10.95 15.71 15.05 1/128 13950<br>

&gt; 1408: 0.36 0.52 0.57 1/81 27167<br>

&gt; 1409: 0.09 0.26 0.43 1/78 17851<br>

&gt; 1410: 0.09 0.17 0.18 1/61 4344<br>

&gt; 1413: 0.00 0.03 0.00 1/46 16539<br>

&gt; 1414: 0.01 0.01 0.00 1/41 22372<br>

&gt; 1415: 0.00 0.01 0.00 1/45 8404<br>

&gt; 1416: 0.05 0.10 0.11 1/58 9292<br>

&gt; 12 active<br>

&gt;<br>

&gt; top - 16:20:02 up 32 days, 11:07,  0 users,  load average: 9.14, 14.97,<br>

&gt; 14.82<br>

&gt; Tasks: 135 total,   1 running, 122 sleeping,   0 stopped,  12 zombie<br>

&gt; Cpu(s): 16.3%us,  2.9%sy,  0.4%ni, 78.5%id,  1.8%wa,  0.0%hi,  0.0%si,<br>

&gt;  0.1%st<br>

&gt; Mem:   4194304k total,  1173844k used,  3020460k free,        0k buffers<br>

&gt; Swap:  8388608k total,   115576k used,  8273032k free,   725144k cache<br>

&gt;<br>

&gt; Notice how cpu is plenty idle, and only 1/4 of the available memory is being<br>

&gt; used.<br>

&gt;<br>

&gt; <a href="http://wiki.openvz.org/Ploop/Why" target="_blank">http://wiki.openvz.org/Ploop/Why</a> explains &quot;One such property that deserves a<br>

&gt; special item in this list is file system journal. While journal is a good<br>

&gt; thing to have, because it helps to maintain file system integrity and<br>

&gt; improve reboot times (by eliminating fsck in many cases), it is also a<br>

&gt; bottleneck for containers. If one container will fill up in-memory journal<br>

&gt; (with lots of small operations leading to file metadata updates, e.g. file<br>

&gt; truncates), all the other containers I/O will block waiting for the journal<br>

&gt; to be written to disk. In some extreme cases we saw up to 15 seconds of such<br>

&gt; blockage.&quot;.   The problem I noticed last much longer than 15 seconds though<br>

&gt; - typically 15-30 minutes, then load goes back where it should be.<br>

&gt;<br>

&gt; Any suggestions where I could look for the cause of this?  It&#39;s not like it<br>

&gt; happens everyday, maybe once or twice per month, but it&#39;s enough to cause<br>

&gt; customers to complain.<br>

&gt;<br>

&gt; Regards,<br>

&gt; Rene<br>

&gt;<br>

&gt;<br>

</div></div>&gt; _______________________________________________<br>

&gt; Users mailing list<br>

&gt; <a href="mailto:Users@openvz.org">Users@openvz.org</a><br>

&gt; <a href="https://openvz.org/mailman/listinfo/users" target="_blank">https://openvz.org/mailman/listinfo/users</a><br>

&gt;<br>

<br>

<br>

<br>

--<br>

Satzmedia GmbH<br>

<br>

Altonaer Poststraße 9<br>

22767 Hamburg<br>

Tel:  +49 (0) 40 - 1 888 969 - 140<br>

Fax: +49 (0) 40 - 1 888 969 - 200<br>

E-Mail: <a href="mailto:s.johannsen@satzmedia.de">s.johannsen@satzmedia.de</a><br>

E-Business-Lösungen: <a href="http://www.satzmedia.de" target="_blank">http://www.satzmedia.de</a><br>

Amtsgericht Hamburg, HRB 71729<br>

Ust-IDNr. DE201979921<br>

Geschäftsführer:<br>

Dipl.-Kfm. Christian Satz<br>

Dipl.-Inform. Markus Meyer-Westphal<br>

<span class="HOEnZb"><font color="#888888"><br>

--<br>

<br>

<br>

_______________________________________________<br>

Users mailing list<br>

<a href="mailto:Users@openvz.org">Users@openvz.org</a><br>

<a href="https://openvz.org/mailman/listinfo/users" target="_blank">https://openvz.org/mailman/listinfo/users</a><br>

</font></span></blockquote></div><br></div>