[Users] ZFS vs ploop

Gena Makhomed gmm at csdoc.com
Fri Jul 24 17:57:21 PDT 2015


On 25.07.2015 1:06, Kir Kolyshkin wrote:

>> I think this is not good idea run ploop compaction more frequently,
>> then one time per day at the night - so we need take into account
>> not minimal value of overhead, but maximal one, after 24 hours
>> of container work in normal mode - to planning disk space
>> on hardware node for all ploop images.
>>
>> so real overhead of ploop can be accounted only
>> after at lest 24h of container being in running state.

One more problem: it is not possible to predict ploop size
after 24h of container working. In case of many containers
- required disk space on hardware node can be predicted only
with some probability, even if really used disk space inside
ploop images is considerably less then total ploop images size.

Possible workaround: add trigger, which run ploop image online
compacting if unused space inside ploop image exceed some threshold
expressed in absolute numbers and/or in percent of unused disk space.
Second threshold: when free space on partition with ploop images are
low. Ideally - also take in account overall system i/o load and make
this background optimization at idle priority, then system is not busy.

Possible solution #1: don't use filesystem for storing ploop images,
and use something like zpool - common pool for all ploop images,
later it will possible easy add block-level deduplication
and defragmentation for all ploop images, also unused blocks
from one ploop image can be returned back to pool immediately
without future need manually run ploop image compacting at all.

Possible solution #2: use ZFS zvol as background for locating
container images on top of zvols, - this also add very efficient
snapshots and full path data integrity checking with checksums
Also - this add transparent compression on block level
and ability to build and use any level of "software raid"
without need to use legacy raid with "write hole vulnerability":
http://www.raid-recovery-guide.com/raid5-write-hole.aspx
and many other existing/future bugs in hardware RAID firmwares.
Also you will get for free ability to accelerate read operations
by L2ARC and convert all sync writes to async writes via ZIL
without integrity degradation and without performance degradation!

Possible solution #3: use ZFS filesystem for storing CT filesystem,
all get complete set of ZFS features for free. Only live migration
need additional work for seamless integration of OpenVZ and ZFS.

Possible solution #4: implement both variants, #2 and #3,
and allow users to select which variant should be used
for entire CT tree or some subtree of CT filesystem.

LLNL already port ZFS to Linux, so 99% of work already done:
http://zfsonlinux.org/docs/LUG11_ZFS_on_Linux_for_Lustre.pdf
http://zfsonlinux.org/docs.html

>> what I am doing wrong, and how I can decrease ploop overhead here?
>
> Most probably it's because of filesystem defragmentation (my item #2
> above).
> We are currently working on that. For example, see this report:
>
>   https://lwn.net/Articles/637428/

This defragmentation tool can be used in case "XFS over ploop over XFS"
for defragmenting both filesystems - inside ploop container and at HN ?

Or defragmentation will be used only inside ploop
to align internal filesystem to ploop 1MiB chunks?

-- 
Best regards,
  Gena


More information about the Users mailing list