Hi Sirk,<div><br>Thanks for your reply. I'm so pleased having found this mailing list after having tried the forum, which seem to have very little activity!</div><div><br></div><div>Ploop is a great idea technically, but I'm a little concerned about the " Warning: This is a new feature, not yet ready for production systems. Use with caution." on the OpenVZ Wiki page, so I'm kinda waiting for the green-light that it's ready for production environments.</div>
<div><br></div><div>It did occur to me that disk-IO could be the cause of the problem, but iostat on the hardware node did not suggest any particular IO problems. I still haven't found a way to see the IO activity within a container - iostat just comes up blank when it's run within a container. Is there a way? </div>
<div><br></div><div>We're not using any network storage with this server so that is not the reason.</div><div><br></div><div>The server has 4 SATA-3 drives, with the root partition being on one drive, the problem container alone on a second drive, and the remaining containers on a third.</div>
<div><br>Best,</div><div>Rene</div><div><br><div class="gmail_quote">On Tue, May 22, 2012 at 3:06 PM, Sirk Johannsen <span dir="ltr"><<a href="mailto:s.johannsen@satzmedia.de" target="_blank">s.johannsen@satzmedia.de</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Rene,<br>
<br>
Since CPU and MEM are fine it's most likely to be Disk-IO.<br>
I have similar Problems with a Cluster Setup based on OpenVZ.<br>
The problem is that our Storage is way to slow.<br>
We have been accessing the storage via NFS and put all our CTs private<br>
areas on it.<br>
I noticed many times that one CT was doing a lot of disk IO and all<br>
other were suffering from that... that even lead to total system<br>
failures.<br>
This has been solved by converting everything to ploop. Since then our<br>
system is at least in a stable state.<br>
IO Performance is still an issue but does not bring our system down.<br>
<br>
You should give ploop a try :-) I am very happy with it.<br>
<br>
best regards,<br>
<br>
Sirk<br>
<br>
2012/5/21 Rene Dokbua <<a href="mailto:openvz@dokbua.com">openvz@dokbua.com</a>>:<br>
<div><div class="h5">> Hello,<br>
><br>
> I occasionally get this extreme load on one of our VPS servers. It is quite<br>
> large, 4 full E31230 cores, 4 GB RAM and hosting ca. 400 websites +<br>
> parked/addon/subdomains.<br>
><br>
> The hardware node has 12 active VPS servers and most of the time things are<br>
> chugging along just fine, something like this.<br>
><br>
> 1401: 0.00 0.00 0.00 1/23 4561<br>
> 1402: 0.02 0.05 0.05 1/57 16991<br>
> 1404: 0.01 0.02 0.00 1/73 18863<br>
> 1406: 0.07 0.13 0.06 1/39 31189<br>
> 1407: 0.86 1.03 1.14 1/113 31460<br>
> 1408: 0.17 0.17 0.18 1/79 32579<br>
> 1409: 0.00 0.00 0.02 1/77 21784<br>
> 1410: 0.01 0.02 0.00 1/60 7454<br>
> 1413: 0.00 0.00 0.00 1/46 18579<br>
> 1414: 0.00 0.00 0.00 1/41 23812<br>
> 1415: 0.00 0.00 0.00 1/45 9831<br>
> 1416: 0.05 0.02 0.00 1/59 11332<br>
> 12 active<br>
><br>
> The problem VPS is 1407. As you can see below it only uses a bit of the cpu<br>
> and memory.<br>
><br>
> top - 17:34:12 up 32 days, 12:21, 0 users, load average: 0.78, 0.95, 1.09<br>
> Tasks: 102 total, 4 running, 90 sleeping, 0 stopped, 8 zombie<br>
> Cpu(s): 16.3%us, 2.9%sy, 0.4%ni, 78.5%id, 1.8%wa, 0.0%hi, 0.0%si,<br>
> 0.1%st<br>
> Mem: 4194304k total, 2550572k used, 1643732k free, 0k buffers<br>
> Swap: 8388608k total, 105344k used, 8283264k free, 1793828k cached<br>
><br>
> Also iostat and vmstat shows no particular io or swap activity.<br>
><br>
> Now for the problem. Every once in a while the loadavg of this particular<br>
> VPS shoots up to like crazy values, 30 or more and it becomes completely<br>
> sluggish. The odd thing is load goes up for the VPS server, and starts<br>
> spilling into other VPS serers on the same hardware node - but there are<br>
> still no particular cpu/memory/io usage going on that I can se. No<br>
> particular network activity. In this example load has fallen back to<br>
> around 10 but it was much higher earlier.<br>
><br>
> 16:19:44 up 32 days, 11:19, 3 users, load average: 12.87, 19.11, 18.87<br>
><br>
> 1401: 0.01 0.03 0.00 1/23 2876<br>
> 1402: 0.00 0.11 0.13 1/57 15334<br>
> 1404: 0.02 0.20 0.16 1/77 14918<br>
> 1406: 0.01 0.13 0.10 1/39 29595<br>
> 1407: 10.95 15.71 15.05 1/128 13950<br>
> 1408: 0.36 0.52 0.57 1/81 27167<br>
> 1409: 0.09 0.26 0.43 1/78 17851<br>
> 1410: 0.09 0.17 0.18 1/61 4344<br>
> 1413: 0.00 0.03 0.00 1/46 16539<br>
> 1414: 0.01 0.01 0.00 1/41 22372<br>
> 1415: 0.00 0.01 0.00 1/45 8404<br>
> 1416: 0.05 0.10 0.11 1/58 9292<br>
> 12 active<br>
><br>
> top - 16:20:02 up 32 days, 11:07, 0 users, load average: 9.14, 14.97,<br>
> 14.82<br>
> Tasks: 135 total, 1 running, 122 sleeping, 0 stopped, 12 zombie<br>
> Cpu(s): 16.3%us, 2.9%sy, 0.4%ni, 78.5%id, 1.8%wa, 0.0%hi, 0.0%si,<br>
> 0.1%st<br>
> Mem: 4194304k total, 1173844k used, 3020460k free, 0k buffers<br>
> Swap: 8388608k total, 115576k used, 8273032k free, 725144k cache<br>
><br>
> Notice how cpu is plenty idle, and only 1/4 of the available memory is being<br>
> used.<br>
><br>
> <a href="http://wiki.openvz.org/Ploop/Why" target="_blank">http://wiki.openvz.org/Ploop/Why</a> explains "One such property that deserves a<br>
> special item in this list is file system journal. While journal is a good<br>
> thing to have, because it helps to maintain file system integrity and<br>
> improve reboot times (by eliminating fsck in many cases), it is also a<br>
> bottleneck for containers. If one container will fill up in-memory journal<br>
> (with lots of small operations leading to file metadata updates, e.g. file<br>
> truncates), all the other containers I/O will block waiting for the journal<br>
> to be written to disk. In some extreme cases we saw up to 15 seconds of such<br>
> blockage.". The problem I noticed last much longer than 15 seconds though<br>
> - typically 15-30 minutes, then load goes back where it should be.<br>
><br>
> Any suggestions where I could look for the cause of this? It's not like it<br>
> happens everyday, maybe once or twice per month, but it's enough to cause<br>
> customers to complain.<br>
><br>
> Regards,<br>
> Rene<br>
><br>
><br>
</div></div>> _______________________________________________<br>
> Users mailing list<br>
> <a href="mailto:Users@openvz.org">Users@openvz.org</a><br>
> <a href="https://openvz.org/mailman/listinfo/users" target="_blank">https://openvz.org/mailman/listinfo/users</a><br>
><br>
<br>
<br>
<br>
--<br>
Satzmedia GmbH<br>
<br>
Altonaer Poststraße 9<br>
22767 Hamburg<br>
Tel: +49 (0) 40 - 1 888 969 - 140<br>
Fax: +49 (0) 40 - 1 888 969 - 200<br>
E-Mail: <a href="mailto:s.johannsen@satzmedia.de">s.johannsen@satzmedia.de</a><br>
E-Business-Lösungen: <a href="http://www.satzmedia.de" target="_blank">http://www.satzmedia.de</a><br>
Amtsgericht Hamburg, HRB 71729<br>
Ust-IDNr. DE201979921<br>
Geschäftsführer:<br>
Dipl.-Kfm. Christian Satz<br>
Dipl.-Inform. Markus Meyer-Westphal<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
<br>
<br>
_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@openvz.org">Users@openvz.org</a><br>
<a href="https://openvz.org/mailman/listinfo/users" target="_blank">https://openvz.org/mailman/listinfo/users</a><br>
</font></span></blockquote></div><br></div>