[Users] Shortest guide about running OpenVZ containers on top of ZFS

Pavel Snajdr lists at snajpa.net
Thu Nov 13 01:06:49 PST 2014


On 11/13/2014 10:49 AM, Devon B. wrote:
> Are you talking about ZFS in general or ZoL?  The problems I've seen
> with ZoL is the performance inconsistencies (review github for details),
> redundant caching, and of course user quotas (workaround is zvols but
> zvols seem to have their own issues in current ZoL releases).  As for
> redundant caching, data seems to be cached by the linux page cache and
> ARC leaving less available for both.  I have yet to talk to someone who
> is using ZoL in production for OpenVZ.  

I'm using ZoL with OpenVZ together, so I would probably be the guy to
talk to :)
And I've seen the issues, though since 0.6.3 they're mostly gone. There
are some mutex contentions pending to be addressed, but you have to have
a specific workload to hit those. And they should be addressed by the
time 0.6.4 comes out.

I can't exactly see how ZVOLs are workaround for quotas - you're
creating a different FS on top of ZVOL, which sort of nulls most of the
ZFS goodies (having a point-in-time consistent snapshot comes to mind
first).

As for redundant caching, the only thing cached redundantly is the
dentry cache, which in case of ZFS doesn't make sense, but it's so high
on the VFS level in the kernel, that it can't be turned off. However,
page cache doesn't apply to ZFS, so data is not cached twice.

On a related note I'd be grateful if the dentries could be prevented
from being accounted into the CTs memory. I've looked into the kernel
code and there's no easy flip switch. I haven't had the balls to cut it
out of there by commenting it out yet. Though if nobody as a better
idea, that's what I'm going to do during the Christmas holiday.

/snajpa

> 
>> Pavel Snajdr <mailto:lists at snajpa.net>
>> Thursday, November 13, 2014 3:21 AM
>> Oh, again, this debate always goes on and on :)
>>
>> Guys, try ZFS yourselves and come back here :)
>>
>> You obviously haven't seen ARC caching in action. You haven't played
>> with snapshots. You haven't seen what the online compression can do.
>>
>> Etc., etc., etc.
>>
>> There's lots to ZFS, which neither BTRFS will ever even remotely approach.
>>
>> Try having this config:
>>
>> - 300+ containers on a single node
>> - 128G RAM
>> - 6 spindles, 2 SSDs
>> - run MySQL on at least 50 of the containers
>>
>> Not only it is way too faster than anything you could do with ext4 even
>> if it's split via ploop into smaller filesystems, it is also much, much
>> easier to manage. ZFS has a tree structure of filesystems with property
>> inheritance.
>>
>> It's designed to be The Solution for situations exactly like this one.
>>
>> The only shortcoming I can really see and mention from my experience of
>> running an OpenVZ based hosting with 850 active CTs on top of ZFS, is
>> that it lacks the support for dquota.
>>
>> I've looked into integrating dquota with ZFS, but it's such an utter
>> mess of an invention, that I have quickly changed my mind and instead
>> we're just doing more datasets (== subvolumes in BTRFS). They are really
>> inexpensive (16kB each), can have own size limits (quotas in ZFS lingo)
>> and thanks to the tree structure with inheritance it's easy to manage
>> them.
>>
>> Also, forget about rsync and all that crap. Send/receive kills it with
>> ease.
>>
>> /snajpa
>>
>> _______________________________________________
>> Users mailing list
>> Users at openvz.org
>> https://lists.openvz.org/mailman/listinfo/users
>> Scott Dowdle <mailto:dowdle at montanalinux.org>
>> Wednesday, November 12, 2014 2:48 PM
>> Greetings,
>>
>> ----- Original Message -----
>>
>> Performance issues aren't the only problem ploop solves... it also
>> solves the changing inode issue. When a container is migrated from one
>> host to another with simfs, inodes will change... and some services
>> don't like that. Also because the size of a ploop disk image is fixed
>> (although changeable), the fixed size acts as a quota... so you get
>> your quota back if you turned it off.
>>
>> For me, unless something changes, ZFS isn't a starter because almost
>> no one ships with it because of licensing issues.
>>
>> How about btrfs? I don't think btrfs is available easily in the
>> existing OpenVZ kernels... nor in a modular format (like ZFS) so we
>> might have to wait until the availability of a RHEL7-based OpenVZ
>> branch. Red Hat still considers btrfs experimental but that may change
>> with upcoming RHEL7 updates. Both SUSE and Oracle have been using
>> btrfs for some time although they do not support btrfs' entire feature
>> set... they stick with the basic features and avoid the less mature
>> ones. Luckily that includes mirror, checksums, snapshoting,
>> subvolumes, etc.
>>
>> I wouldn't put simfs and ploop in the same column as the underlying
>> filesystems.
>>
>> I'm not sure why the chart says that simfs has issues with migration.
>> Other than the inode issue, which isn't an issue with the services I
>> run, simfs actually migrates faster because it doesn't have to
>> transfer the entire disk image... and if the host has been migrated
>> before and has a previous copy of its filesystem available, only the
>> changed files have to be transferred... saving a lot of time.
>>
>> TYL,
> 
> 
> _______________________________________________
> Users mailing list
> Users at openvz.org
> https://lists.openvz.org/mailman/listinfo/users
> 



More information about the Users mailing list