[Users] Host IO delay high - Beancounters failure

Wed Dec 9 06:31:17 PST 2015

Hey OpenVZ users,

I’m currently encountering a weird issue but don’t know how to fix it.

Some containers on our node are freaking out sometimes. That’s not the
issue. It’s customer related.

However they take the node almost down. The host node has an IO delay of
30-50% at that time and needs about 5 minutes to

Become stable again. SSH login is (almost) impossible. Shell is not running
properly.

Dmesg reports the following:

__ratelimit: 1892 callbacks suppressed

Fatal resource shortage: privvmpages, UB 371.

After taking a look at the bean counters of that container I got the
following:

----------------------------------------------------------------

CT 371       | HELD Bar% Lim%| MAXH Bar% Lim%| BAR | LIM | FAIL

-------------+---------------+---------------+-----+-----+------

     kmemsize|45.8M   -    - | 223M   -    - |   - |   - |    -

  lockedpages|   -    -    - |  32K   -    - |   4G|   4G|    -

  privvmpages|3.25G  27%  27%|  12G 100% 100%|  12G|  12G|  303K

     shmpages| 114M   -    - | 147M   -    - |   - |   - |    -

      numproc| 127    -    - | 318    -    - |   - |   - |    -

    physpages|1.08G   -   26%|   4G   -  100%|   - |   4G|    -

  vmguarpages|   -    -    - |   -    -    - |   8G|   - |    -

oomguarpages|1008M  24%   - |2.45G  61%   - |   4G|   - |    -

   numtcpsock|  31    -    - | 173    -    - |   - |   - |    -

     numflock|  21    -    - |  48    -    - |   - |   - |    -

       numpty|   -    -    - |   1    -    - |   - |   - |    -

   numsiginfo|   -    -    - | 102    -    - |   - |   - |    -

    tcpsndbuf|1.22M   -    - |16.6M   -    - |   - |   - |    -

    tcprcvbuf| 496K   -    - | 2.7M   -    - |   - |   - |    -

othersockbuf|81.3K   -    - |5.72M   -    - |   - |   - |    -

  dgramrcvbuf|   -    -    - | 117K   -    - |   - |   - |    -

numothersock|  66    -    - | 284    -    - |   - |   - |    -

   dcachesize|24.8M   -    - | 178M   -    - |   - |   - |    -

      numfile|2.69K   -    - |3.13K   -    - |   - |   - |    -

    numiptent|  62    -    - |  62    -    - |   - |   - |    -

    swappages| 367M   -    9%| 986M   -   24%|   - |   4G|    -

A failure count of 300.000 on privvmpages is not normal. However I’m using
vSWAP and the RAM is limited to 12G.

Node has 30GB of 64GB free, so that’s not the issue.

Anyone has a clue, why the host is almost going down? A single container
shouldn’t affect the hosts performance.

Currently running pve-kernel-2.6.32-43-pve:
2.6.32-166(pve-kernel-2.6.32-43-pve: 2.6.32-166) with vzctl 4.9-4.

Containers are using the ploop layout and are laying on a LVM(root lvm
partition).

Thank you for your time.

----------------------------------------------------------------------------
-------------

If you have any further questions, please let us know.

Mit freundlichen Grüßen / With best regards 

Henry Spanka | myVirtualserver Development Team

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/users/attachments/20151209/1ce85175/attachment.html>