<div dir="ltr">Completely agree with Gena Makhomed on points he raised about ploop.<div><br></div><div>If you run HN on SSD (256gb for example) ploop is not good to use at all, too much overhead of space.</div><div><br></div><div>Would be nice to have better free space management in ploop somehow.</div><div><br></div><div>Also about OpenVZ restore option: </div><div><br></div><div>Here is real example from production environment: kannel (<a href="http://kannel.org">http://kannel.org</a>) is not working properly with OpenVZ suspend/restore feature (turned on by default), so had to use VE_STOP_MODE=stop instead.</div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-07-24 15:41 GMT+03:00 Gena Makhomed <span dir="ltr"><<a href="mailto:gmm@csdoc.com" target="_blank">gmm@csdoc.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 23.07.2015 5:44, Kir Kolyshkin wrote:<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
My experience with ploop:<br>
<br>
DISKSPACE limited to 256 GiB, real data used inside container<br>
was near 40-50% of limit 256 GiB, but ploop image is lot bigger,<br>
it use near 256 GiB of space at hardware node. Overhead ~ 50-60%<br>
<br>
I found workaround for this: run "/usr/sbin/vzctl compact $CT"<br>
via cron every night, and now ploop image has less overhead.<br>
<br>
current state:<br>
<br>
on hardware node:<br>
<br>
# du -b /vz/private/155/root.hdd<br>
205963399961 /vz/private/155/root.hdd<br>
<br>
inside container:<br>
<br>
# df -B1<br>
Filesystem 1B-blocks Used Available Use%<br>
Mounted on<br>
/dev/ploop38149p1 270426705920 163129053184 94928560128 64% /<br>
<br>
====================================<br>
<br>
used space, bytes: 163129053184<br>
<br>
image size, bytes: 205963399961<br>
<br>
"ext4 over ploop over ext4" solution disk space overhead is near 26%,<br>
or is near 40 GiB, if see this disk space overhead in absolute numbers.<br>
<br>
This is main disadvantage of ploop.<br>
<br>
And this disadvantage can't be avoided - it is "by design".<br>
</blockquote>
<br>
To anyone reading this, there are a few things here worth noting.<br>
<br>
a. Such overhead is caused by three things:<br>
1. creating then removing data (vzctl compact takes care of that)<br>
2. filesystem fragmentation (we have some experimental patches to ext4<br>
plus an ext4 defragmenter to solve it, but currently it's still in<br>
research stage)<br>
3. initial filesystem layout (which depends on initial ext4 fs size,<br>
including inode requirement)<br>
<br>
So, #1 is solved, #2 is solvable, and #3 is a limitation of the used<br>
file system and can me mitigated<br>
by properly choosing initial size of a newly created ploop.<br>
</blockquote>
<br></div></div>
this container is compacted every night, during working day<br>
only new static files added to container, this container does<br>
not contain many "creating then removing data" operations.<span class=""><br>
<br>
current state:<br>
<br>
on hardware node:<br>
<br>
# du -b /vz/private/155/root.hdd<br></span>
203547480857 /vz/private/155/root.hdd<span class=""><br>
<br>
inside container:<br>
<br>
# df -B1<br>
Filesystem 1B-blocks Used Available Use% Mounted on<br></span>
/dev/ploop55410p1 270426705920 163581190144 94476423168 64% /<br>
<br>
<br>
used space, bytes: 163581190144<br>
<br>
image size, bytes: 203547480857<br>
<br>
overhead: ~ 37 GiB, ~ 19.6%<br>
<br>
container was compacted at 03:00<br>
by command /usr/sbin/vzctl compact 155<br>
<br>
run container compacting right now:<br>
9443 clusters have been relocated<br>
<br>
result:<br>
<br>
used space, bytes: 163604983808<br>
<br>
image size, bytes: 193740149529<br>
<br>
overhead: ~ 28 GiB, ~ 15.5%<br>
<br>
I think this is not good idea run ploop compaction more frequently,<br>
then one time per day at the night - so we need take into account<br>
not minimal value of overhead, but maximal one, after 24 hours<br>
of container work in normal mode - to planning disk space<br>
on hardware node for all ploop images.<br>
<br>
so real overhead of ploop can be accounted only<br>
after at lest 24h of container being in running state.<span class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
A example of #3 effect is this: if you create a very large filesystem<br>
initially (say, 16TB) and then downsize it (say, to 1TB), filesystem<br>
metadata overhead will be quite big. Same thing happens if you ask<br>
for lots of inodes (here "lots" means more than a default value which<br>
is 1 inode per 16K of disk space). This happens because ext4<br>
filesystem is not designed to shrink. Therefore, to have lowest<br>
possible overhead you have to choose the initial filesystem size<br>
carefully. Yes, this is not a solution but a workaround.<br>
</blockquote>
<br></span>
as you can see by inodes:<br>
<br>
# df -i<br>
Filesystem Inodes IUsed IFree IUse% Mounted on<br>
/dev/ploop55410p1 16777216 1198297 15578919 8% /<br>
<br>
initial filesystem size was 256 GiB:<br>
<br>
c (16777216 * 16 * 1024) / 1024.0/1024.0/1024.0 == 256 GiB.<br>
<br>
current filesystem size is also 256 GiB:<br>
<br>
# cat /etc/vz/conf/155.conf | grep DISKSPACE<br>
DISKSPACE="268435456:268435456"<br>
<br>
so there is no extra "filesystem metadata overhead".<br>
<br>
what I am doing wrong, and how I can decrease ploop overhead here?<br>
<br>
I found only one way: migrate to ZFS with turned on lz4 compression.<span class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Also note, that ploop was not designed with any specific filesystem in<br>
mind, it is universal, so #3 can be solved by moving to a different fs in the future.<br>
</blockquote>
<br></span>
XFS currently not support filesystem snrinking at all:<br>
<a href="http://xfs.org/index.php/Shrinking_Support" rel="noreferrer" target="_blank">http://xfs.org/index.php/Shrinking_Support</a><br>
<br>
BTRFS is not production-ready and no other variants<br>
except ext4 are available for using with ploop in near future.<br>
<br>
ploop is great work, really. But it has some disadvantages:<br>
<a href="https://github.com/pavel-odintsov/OpenVZ_ZFS/blob/master/ploop_issues.md" rel="noreferrer" target="_blank">https://github.com/pavel-odintsov/OpenVZ_ZFS/blob/master/ploop_issues.md</a><span class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
My experience with ZFS:<br>
<br>
real data used inside container near 62 GiB,<br>
real space used on hard disk is near 11 GiB.<br>
</blockquote>
<br>
So, you are not even comparing apples to apples here. You just took two<br>
different containers, certainly of different sizes, probably also<br>
different data sets<br>
and usage history. Not saying it's invalid, but if you want to have a<br>
meaningful<br>
(rather than anecdotal) comparison, you need to use same data sets, same<br>
operations on data etc., try to optimize each case, and compare<br>
</blockquote>
<br></span>
This is not strict scientific comparison, mainly this is illustration,<br>
how ploop create overhead and how ZFS save disk space using compression.<br>
<br>
Even if you do very strict comparison - conclusion will be the same.<br>
<br>
=======================================================================<br>
<br>
P.S. Block level deduplication is dark corner, and I prefer not use it<br>
with ZFS and with ploop, so I prefer leave it behind the scene, sorry.<br>
<br>
Also you understand what if I install CentOS 7.0 inside containers<br>
using common ploop image, and later upgrade system to CentOS 7.1 -<br>
all new containers will got own copy of system files, and such<br>
artificial deduplication practically will be turned off at all.<div class="HOEnZb"><div class="h5"><br>
<br>
-- <br>
Best regards,<br>
Gena<br>
_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@openvz.org" target="_blank">Users@openvz.org</a><br>
<a href="https://lists.openvz.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.openvz.org/mailman/listinfo/users</a><br>
</div></div></blockquote></div><br></div>