[Users] ZFS vs ploop

Fri Jul 24 06:31:57 PDT 2015

Completely agree with Gena Makhomed on points he raised about ploop.

If you run HN on SSD (256gb for example) ploop is not good to use at all,
too much overhead of space.

Would be nice to have better free space management in ploop somehow.

Also about OpenVZ restore option:

Here is real example from production environment: kannel (http://kannel.org)
is not working properly with OpenVZ suspend/restore feature (turned on by
default), so had to use VE_STOP_MODE=stop instead.

2015-07-24 15:41 GMT+03:00 Gena Makhomed <gmm at csdoc.com>:

> On 23.07.2015 5:44, Kir Kolyshkin wrote:
>
>  My experience with ploop:
>>>
>>> DISKSPACE limited to 256 GiB, real data used inside container
>>> was near 40-50% of limit 256 GiB, but ploop image is lot bigger,
>>> it use near 256 GiB of space at hardware node. Overhead ~ 50-60%
>>>
>>> I found workaround for this: run "/usr/sbin/vzctl compact $CT"
>>> via cron every night, and now ploop image has less overhead.
>>>
>>> current state:
>>>
>>> on hardware node:
>>>
>>> # du -b /vz/private/155/root.hdd
>>> 205963399961    /vz/private/155/root.hdd
>>>
>>> inside container:
>>>
>>> # df -B1
>>> Filesystem               1B-blocks          Used    Available Use%
>>> Mounted on
>>> /dev/ploop38149p1     270426705920  163129053184  94928560128  64% /
>>>
>>> ====================================
>>>
>>> used space, bytes: 163129053184
>>>
>>> image size, bytes: 205963399961
>>>
>>> "ext4 over ploop over ext4" solution disk space overhead is near 26%,
>>> or is near 40 GiB, if see this disk space overhead in absolute numbers.
>>>
>>> This is main disadvantage of ploop.
>>>
>>> And this disadvantage can't be avoided - it is "by design".
>>>
>>
>> To anyone reading this, there are a few things here worth noting.
>>
>> a. Such overhead is caused by three things:
>> 1. creating then removing data (vzctl compact takes care of that)
>> 2. filesystem fragmentation (we have some experimental patches to ext4
>>      plus an ext4 defragmenter to solve it, but currently it's still in
>> research stage)
>> 3. initial filesystem layout (which depends on initial ext4 fs size,
>> including inode requirement)
>>
>> So, #1 is solved, #2 is solvable, and #3 is a limitation of the used
>> file system and can me mitigated
>> by properly choosing initial size of a newly created ploop.
>>
>
> this container is compacted every night, during working day
> only new static files added to container, this container does
> not contain many "creating then removing data" operations.
>
> current state:
>
> on hardware node:
>
> # du -b /vz/private/155/root.hdd
> 203547480857    /vz/private/155/root.hdd
>
> inside container:
>
> # df -B1
> Filesystem               1B-blocks          Used    Available Use% Mounted
> on
> /dev/ploop55410p1     270426705920  163581190144  94476423168  64% /
>
>
> used space, bytes: 163581190144
>
> image size, bytes: 203547480857
>
> overhead: ~ 37 GiB, ~ 19.6%
>
> container was compacted at 03:00
> by command /usr/sbin/vzctl compact 155
>
> run container compacting right now:
> 9443 clusters have been relocated
>
> result:
>
> used space, bytes: 163604983808
>
> image size, bytes: 193740149529
>
> overhead: ~ 28 GiB, ~ 15.5%
>
> I think this is not good idea run ploop compaction more frequently,
> then one time per day at the night - so we need take into account
> not minimal value of overhead, but maximal one, after 24 hours
> of container work in normal mode - to planning disk space
> on hardware node for all ploop images.
>
> so real overhead of ploop can be accounted only
> after at lest 24h of container being in running state.
>
>  A example of #3 effect is this: if you create a very large filesystem
>> initially (say, 16TB) and then downsize it (say, to 1TB), filesystem
>> metadata overhead will be quite big. Same thing happens if you ask
>> for lots of inodes (here "lots" means more than a default value which
>> is 1 inode per 16K of disk space). This happens because ext4
>> filesystem is not designed to shrink. Therefore, to have lowest
>> possible overhead you have to choose the initial filesystem size
>> carefully. Yes, this is not a solution but a workaround.
>>
>
> as you can see by inodes:
>
> # df -i
> Filesystem Inodes IUsed IFree IUse% Mounted on
> /dev/ploop55410p1 16777216 1198297 15578919 8% /
>
> initial filesystem size was 256 GiB:
>
> c (16777216 * 16 * 1024) / 1024.0/1024.0/1024.0 == 256 GiB.
>
> current filesystem size is also 256 GiB:
>
> # cat /etc/vz/conf/155.conf | grep DISKSPACE
> DISKSPACE="268435456:268435456"
>
> so there is no extra "filesystem metadata overhead".
>
> what I am doing wrong, and how I can decrease ploop overhead here?
>
> I found only one way: migrate to ZFS with turned on lz4 compression.
>
>  Also note, that ploop was not designed with any specific filesystem in
>> mind, it is universal, so #3 can be solved by moving to a different fs in
>> the future.
>>
>
> XFS currently not support filesystem snrinking at all:
> http://xfs.org/index.php/Shrinking_Support
>
> BTRFS is not production-ready and no other variants
> except ext4 are available for using with ploop in near future.
>
> ploop is great work, really. But it has some disadvantages:
> https://github.com/pavel-odintsov/OpenVZ_ZFS/blob/master/ploop_issues.md
>
>  My experience with ZFS:
>>>
>>> real data used inside container near 62 GiB,
>>> real space used on hard disk is near 11 GiB.
>>>
>>
>> So, you are not even comparing apples to apples here. You just took two
>> different containers, certainly of different sizes, probably also
>> different data sets
>> and usage history. Not saying it's invalid, but if you want to have a
>> meaningful
>> (rather than anecdotal) comparison, you need to use same data sets, same
>> operations on data etc., try to optimize each case, and compare
>>
>
> This is not strict scientific comparison, mainly this is illustration,
> how ploop create overhead and how ZFS save disk space using compression.
>
> Even if you do very strict comparison - conclusion will be the same.
>
> =======================================================================
>
> P.S. Block level deduplication is dark corner, and I prefer not use it
> with ZFS and with ploop, so I prefer leave it behind the scene, sorry.
>
> Also you understand what if I install CentOS 7.0 inside containers
> using common ploop image, and later upgrade system to CentOS 7.1 -
> all new containers will got own copy of system files, and such
> artificial deduplication practically will be turned off at all.
>
>
> --
> Best regards,
>  Gena
> _______________________________________________
> Users mailing list
> Users at openvz.org
> https://lists.openvz.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/users/attachments/20150724/9c5bd3e3/attachment.html>