[Users] ZFS vs ploop

Fri Jul 24 05:41:25 PDT 2015

On 23.07.2015 5:44, Kir Kolyshkin wrote:

>> My experience with ploop:
>>
>> DISKSPACE limited to 256 GiB, real data used inside container
>> was near 40-50% of limit 256 GiB, but ploop image is lot bigger,
>> it use near 256 GiB of space at hardware node. Overhead ~ 50-60%
>>
>> I found workaround for this: run "/usr/sbin/vzctl compact $CT"
>> via cron every night, and now ploop image has less overhead.
>>
>> current state:
>>
>> on hardware node:
>>
>> # du -b /vz/private/155/root.hdd
>> 205963399961    /vz/private/155/root.hdd
>>
>> inside container:
>>
>> # df -B1
>> Filesystem               1B-blocks          Used    Available Use%
>> Mounted on
>> /dev/ploop38149p1     270426705920  163129053184  94928560128  64% /
>>
>> ====================================
>>
>> used space, bytes: 163129053184
>>
>> image size, bytes: 205963399961
>>
>> "ext4 over ploop over ext4" solution disk space overhead is near 26%,
>> or is near 40 GiB, if see this disk space overhead in absolute numbers.
>>
>> This is main disadvantage of ploop.
>>
>> And this disadvantage can't be avoided - it is "by design".
>
> To anyone reading this, there are a few things here worth noting.
>
> a. Such overhead is caused by three things:
> 1. creating then removing data (vzctl compact takes care of that)
> 2. filesystem fragmentation (we have some experimental patches to ext4
>      plus an ext4 defragmenter to solve it, but currently it's still in
> research stage)
> 3. initial filesystem layout (which depends on initial ext4 fs size,
> including inode requirement)
>
> So, #1 is solved, #2 is solvable, and #3 is a limitation of the used
> file system and can me mitigated
> by properly choosing initial size of a newly created ploop.

this container is compacted every night, during working day
only new static files added to container, this container does
not contain many "creating then removing data" operations.

current state:

on hardware node:

# du -b /vz/private/155/root.hdd
203547480857    /vz/private/155/root.hdd

inside container:

# df -B1
Filesystem               1B-blocks          Used    Available Use% 
Mounted on
/dev/ploop55410p1     270426705920  163581190144  94476423168  64% /

used space, bytes: 163581190144

image size, bytes: 203547480857

overhead: ~ 37 GiB, ~ 19.6%

container was compacted at 03:00
by command /usr/sbin/vzctl compact 155

run container compacting right now:
9443 clusters have been relocated

result:

used space, bytes: 163604983808

image size, bytes: 193740149529

overhead: ~ 28 GiB, ~ 15.5%

I think this is not good idea run ploop compaction more frequently,
then one time per day at the night - so we need take into account
not minimal value of overhead, but maximal one, after 24 hours
of container work in normal mode - to planning disk space
on hardware node for all ploop images.

so real overhead of ploop can be accounted only
after at lest 24h of container being in running state.

> A example of #3 effect is this: if you create a very large filesystem
> initially (say, 16TB) and then downsize it (say, to 1TB), filesystem
> metadata overhead will be quite big. Same thing happens if you ask
> for lots of inodes (here "lots" means more than a default value which
> is 1 inode per 16K of disk space). This happens because ext4
> filesystem is not designed to shrink. Therefore, to have lowest
> possible overhead you have to choose the initial filesystem size
> carefully. Yes, this is not a solution but a workaround.

as you can see by inodes:

# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/ploop55410p1 16777216 1198297 15578919 8% /

initial filesystem size was 256 GiB:

c (16777216 * 16 * 1024) / 1024.0/1024.0/1024.0 == 256 GiB.

current filesystem size is also 256 GiB:

# cat /etc/vz/conf/155.conf | grep DISKSPACE
DISKSPACE="268435456:268435456"

so there is no extra "filesystem metadata overhead".

what I am doing wrong, and how I can decrease ploop overhead here?

I found only one way: migrate to ZFS with turned on lz4 compression.

> Also note, that ploop was not designed with any specific filesystem in
> mind, it is universal, so #3 can be solved by moving to a different fs in the future.

XFS currently not support filesystem snrinking at all:
http://xfs.org/index.php/Shrinking_Support

BTRFS is not production-ready and no other variants
except ext4 are available for using with ploop in near future.

ploop is great work, really. But it has some disadvantages:
https://github.com/pavel-odintsov/OpenVZ_ZFS/blob/master/ploop_issues.md

>> My experience with ZFS:
>>
>> real data used inside container near 62 GiB,
>> real space used on hard disk is near 11 GiB.
>
> So, you are not even comparing apples to apples here. You just took two
> different containers, certainly of different sizes, probably also
> different data sets
> and usage history. Not saying it's invalid, but if you want to have a
> meaningful
> (rather than anecdotal) comparison, you need to use same data sets, same
> operations on data etc., try to optimize each case, and compare

This is not strict scientific comparison, mainly this is illustration,
how ploop create overhead and how ZFS save disk space using compression.

Even if you do very strict comparison - conclusion will be the same.

=======================================================================

P.S. Block level deduplication is dark corner, and I prefer not use it
with ZFS and with ploop, so I prefer leave it behind the scene, sorry.

Also you understand what if I install CentOS 7.0 inside containers
using common ploop image, and later upgrade system to CentOS 7.1 -
all new containers will got own copy of system files, and such
artificial deduplication practically will be turned off at all.

-- 
Best regards,
  Gena