<div dir="ltr">Completely agree with Gena Makhomed on points he raised about ploop.<div><br></div><div>If you run HN on SSD (256gb for example) ploop is not good to use at all, too much overhead of space.</div><div><br></div><div>Would be nice to have better free space management in ploop somehow.</div><div><br></div><div>Also about OpenVZ restore option: </div><div><br></div><div>Here is real example from production environment: kannel (<a href="http://kannel.org">http://kannel.org</a>) is not working properly with OpenVZ suspend/restore feature (turned on by default), so had to use VE_STOP_MODE=stop instead.</div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-07-24 15:41 GMT+03:00 Gena Makhomed <span dir="ltr">&lt;<a href="mailto:gmm@csdoc.com" target="_blank">gmm@csdoc.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 23.07.2015 5:44, Kir Kolyshkin wrote:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

My experience with ploop:<br>

<br>

DISKSPACE limited to 256 GiB, real data used inside container<br>

was near 40-50% of limit 256 GiB, but ploop image is lot bigger,<br>

it use near 256 GiB of space at hardware node. Overhead ~ 50-60%<br>

<br>

I found workaround for this: run &quot;/usr/sbin/vzctl compact $CT&quot;<br>

via cron every night, and now ploop image has less overhead.<br>

<br>

current state:<br>

<br>

on hardware node:<br>

<br>

# du -b /vz/private/155/root.hdd<br>

205963399961    /vz/private/155/root.hdd<br>

<br>

inside container:<br>

<br>

# df -B1<br>

Filesystem               1B-blocks          Used    Available Use%<br>

Mounted on<br>

/dev/ploop38149p1     270426705920  163129053184  94928560128  64% /<br>

<br>

====================================<br>

<br>

used space, bytes: 163129053184<br>

<br>

image size, bytes: 205963399961<br>

<br>

&quot;ext4 over ploop over ext4&quot; solution disk space overhead is near 26%,<br>

or is near 40 GiB, if see this disk space overhead in absolute numbers.<br>

<br>

This is main disadvantage of ploop.<br>

<br>

And this disadvantage can&#39;t be avoided - it is &quot;by design&quot;.<br>

</blockquote>

<br>

To anyone reading this, there are a few things here worth noting.<br>

<br>

a. Such overhead is caused by three things:<br>

1. creating then removing data (vzctl compact takes care of that)<br>

2. filesystem fragmentation (we have some experimental patches to ext4<br>

     plus an ext4 defragmenter to solve it, but currently it&#39;s still in<br>

research stage)<br>

3. initial filesystem layout (which depends on initial ext4 fs size,<br>

including inode requirement)<br>

<br>

So, #1 is solved, #2 is solvable, and #3 is a limitation of the used<br>

file system and can me mitigated<br>

by properly choosing initial size of a newly created ploop.<br>

</blockquote>

<br></div></div>

this container is compacted every night, during working day<br>

only new static files added to container, this container does<br>

not contain many &quot;creating then removing data&quot; operations.<span class=""><br>

<br>

current state:<br>

<br>

on hardware node:<br>

<br>

# du -b /vz/private/155/root.hdd<br></span>

203547480857    /vz/private/155/root.hdd<span class=""><br>

<br>

inside container:<br>

<br>

# df -B1<br>

Filesystem               1B-blocks          Used    Available Use% Mounted on<br></span>

/dev/ploop55410p1     270426705920  163581190144  94476423168  64% /<br>

<br>

<br>

used space, bytes: 163581190144<br>

<br>

image size, bytes: 203547480857<br>

<br>

overhead: ~ 37 GiB, ~ 19.6%<br>

<br>

container was compacted at 03:00<br>

by command /usr/sbin/vzctl compact 155<br>

<br>

run container compacting right now:<br>

9443 clusters have been relocated<br>

<br>

result:<br>

<br>

used space, bytes: 163604983808<br>

<br>

image size, bytes: 193740149529<br>

<br>

overhead: ~ 28 GiB, ~ 15.5%<br>

<br>

I think this is not good idea run ploop compaction more frequently,<br>

then one time per day at the night - so we need take into account<br>

not minimal value of overhead, but maximal one, after 24 hours<br>

of container work in normal mode - to planning disk space<br>

on hardware node for all ploop images.<br>

<br>

so real overhead of ploop can be accounted only<br>

after at lest 24h of container being in running state.<span class=""><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

A example of #3 effect is this: if you create a very large filesystem<br>

initially (say, 16TB) and then downsize it (say, to 1TB), filesystem<br>

metadata overhead will be quite big. Same thing happens if you ask<br>

for lots of inodes (here &quot;lots&quot; means more than a default value which<br>

is 1 inode per 16K of disk space). This happens because ext4<br>

filesystem is not designed to shrink. Therefore, to have lowest<br>

possible overhead you have to choose the initial filesystem size<br>

carefully. Yes, this is not a solution but a workaround.<br>

</blockquote>

<br></span>

as you can see by inodes:<br>

<br>

# df -i<br>

Filesystem Inodes IUsed IFree IUse% Mounted on<br>

/dev/ploop55410p1 16777216 1198297 15578919 8% /<br>

<br>

initial filesystem size was 256 GiB:<br>

<br>

c (16777216 * 16 * 1024) / 1024.0/1024.0/1024.0 == 256 GiB.<br>

<br>

current filesystem size is also 256 GiB:<br>

<br>

# cat /etc/vz/conf/155.conf | grep DISKSPACE<br>

DISKSPACE=&quot;268435456:268435456&quot;<br>

<br>

so there is no extra &quot;filesystem metadata overhead&quot;.<br>

<br>

what I am doing wrong, and how I can decrease ploop overhead here?<br>

<br>

I found only one way: migrate to ZFS with turned on lz4 compression.<span class=""><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Also note, that ploop was not designed with any specific filesystem in<br>

mind, it is universal, so #3 can be solved by moving to a different fs in the future.<br>

</blockquote>

<br></span>

XFS currently not support filesystem snrinking at all:<br>

<a href="http://xfs.org/index.php/Shrinking_Support" rel="noreferrer" target="_blank">http://xfs.org/index.php/Shrinking_Support</a><br>

<br>

BTRFS is not production-ready and no other variants<br>

except ext4 are available for using with ploop in near future.<br>

<br>

ploop is great work, really. But it has some disadvantages:<br>

<a href="https://github.com/pavel-odintsov/OpenVZ_ZFS/blob/master/ploop_issues.md" rel="noreferrer" target="_blank">https://github.com/pavel-odintsov/OpenVZ_ZFS/blob/master/ploop_issues.md</a><span class=""><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

My experience with ZFS:<br>

<br>

real data used inside container near 62 GiB,<br>

real space used on hard disk is near 11 GiB.<br>

</blockquote>

<br>

So, you are not even comparing apples to apples here. You just took two<br>

different containers, certainly of different sizes, probably also<br>

different data sets<br>

and usage history. Not saying it&#39;s invalid, but if you want to have a<br>

meaningful<br>

(rather than anecdotal) comparison, you need to use same data sets, same<br>

operations on data etc., try to optimize each case, and compare<br>

</blockquote>

<br></span>

This is not strict scientific comparison, mainly this is illustration,<br>

how ploop create overhead and how ZFS save disk space using compression.<br>

<br>

Even if you do very strict comparison - conclusion will be the same.<br>

<br>

=======================================================================<br>

<br>

P.S. Block level deduplication is dark corner, and I prefer not use it<br>

with ZFS and with ploop, so I prefer leave it behind the scene, sorry.<br>

<br>

Also you understand what if I install CentOS 7.0 inside containers<br>

using common ploop image, and later upgrade system to CentOS 7.1 -<br>

all new containers will got own copy of system files, and such<br>

artificial deduplication practically will be turned off at all.<div class="HOEnZb"><div class="h5"><br>

<br>

-- <br>

Best regards,<br>

 Gena<br>

_______________________________________________<br>

Users mailing list<br>

<a href="mailto:Users@openvz.org" target="_blank">Users@openvz.org</a><br>

<a href="https://lists.openvz.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.openvz.org/mailman/listinfo/users</a><br>

</div></div></blockquote></div><br></div>