<div dir="ltr"><div><div><div><div><div><div><div><div><div>

&gt;1. creating then removing data (vzctl compact takes care of that)<br>&gt;So, #1 is solved<br><br><span id="result_box" class="" lang="en"><span class="">Only partially in fact.<br></span></span></div><span id="result_box" class="" lang="en"><span class="">1. Compact &quot;eat&quot;</span></span><span id="result_box" class="" lang="en"><span class=""><span id="result_box" class="" lang="en"><span class=""> a lot of resources</span><span class="">, </span></span></span></span><span id="result_box" class="" lang="en"><span class="">because of the</span> <span class="">heavy use of</span> <span class="">the disk.<br></span></span></div><span id="result_box" class="" lang="en"><span class="">2. You need compact your ploop very very regulary.<br><br></span></span></div>On our nodes, when we run compact every day, with 3-5T /vz/ daily delta about 4-20% of space!<br></div>Every day it must clean 300 - 500+ Gb.<br><br></div>And it clean not all, as example - <br><br><div style="" class=""><span class=""></span></div>[root@evo12 ~]# vzctl compact 75685<br>Trying to find free extents bigger than 0 bytes<br>Waiting<br>Call FITRIM, for minlen=33554432<br>Call FITRIM, for minlen=16777216<br>Call FITRIM, for minlen=8388608<br>Call FITRIM, for minlen=4194304<br>Call FITRIM, for minlen=2097152<br>Call FITRIM, for minlen=1048576<br>0 clusters have been relocated<br>[root@evo12 ~]# ls -lhat /vz/private/75685/root.hdd/root.hdd<br>-rw------- 1 root root 43G Июл 20 20:45 /vz/private/75685/root.hdd/root.hdd<br>[root@evo12 ~]# vzctl exec 75685 df -h /<br>Filesystem         Size  Used Avail Use% Mounted on<br>/dev/ploop32178p1   50G   26G   21G  56% /<br>[root@evo12 ~]# vzctl --version<br>vzctl version 4.9.2<br><br>&gt;My point was, the feature works fine for many people despite this bug.<span class="im"><br></span></div><span class="im"><br></span></div><span class="im">Not fine, but we need it very much for migration and not. So anyway whe use it, </span><span tabindex="-1" id="result_box" class="" lang="en"><span class="">we have no alternative in fact.<br></span></span></div><span tabindex="-1" id="result_box" class="" lang="en"><span class="">And it one of bugs. Live migration regulary failed, because vzctl cannot restore container correctly after suspend.<br></span></span></div><div><span tabindex="-1" id="result_box" class="" lang="en"><span class="">Cpt is pain in fact. But I want to belive, that CRIU fix everything =)<br><br></span></span></div><div><span tabindex="-1" id="result_box" class="" lang="en"><span class="">And ext4 only with ploop - not good  case, and not modern case too.<br></span></span></div><div><span tabindex="-1" id="result_box" class="" lang="en"><span class="">As example on some big nodes we have few /vz/ partition, because raid controller cannot push all disk in one raid10 logical device. And few /vz/ partition </span></span><span tabindex="-1" id="result_box" class="" lang="en"><span class="">it is not comfortable. </span></span><br><span tabindex="-1" id="result_box" class="" lang="en"><span class="">And it is</span> <span class="">less flexible like one zpool as exapmle.<br></span></span></div><span tabindex="-1" id="result_box" class="" lang="en"><span class=""></span></span><div><div><div><div><div><br><div><div><div><span id="result_box" class="" lang="en"></span><table class=""><tbody><tr><td style="width:100%"></td></tr></tbody></table><div><div><div><span id="result_box" class="" lang="en"><span class=""></span></span><span id="result_box" class="" lang="en"><span class=""></span></span><span id="result_box" class="" lang="en"><span class=""></span></span></div></div></div></div></div></div></div></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-07-23 5:44 GMT+03:00 Kir Kolyshkin <span dir="ltr">&lt;<a href="mailto:kir@openvz.org" target="_blank">kir@openvz.org</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>

<br>

On 07/22/2015 10:08 AM, Gena Makhomed wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

On 22.07.2015 8:39, Kir Kolyshkin wrote:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

1) currently even suspend/resume not work reliable:<br>

<a href="https://bugzilla.openvz.org/show_bug.cgi?id=2470" rel="noreferrer" target="_blank">https://bugzilla.openvz.org/show_bug.cgi?id=2470</a><br>

- I can&#39;t suspend and resume containers without bugs.<br>

and as result - I also can&#39;t use it for live migration.<br>

</blockquote>

<br>

Valid point, we need to figure it out. What I don&#39;t understand<br>

is how lots of users are enjoying live migration despite this bug.<br>

Me, personally, I never came across this.<br>

</blockquote>

<br>

Nevertheless, steps to 100% reproduce bug provided in bugreport.<br>

</blockquote>

<br></span>

I was not saying anything about the bug report being bad/incomplete.<br>

My point was, the feature works fine for many people despite this bug.<span class=""><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

2) I see in google many bugreports about this feature:<br>

&quot;openvz live migration kernel panic&quot; - so I prefer make<br>

planned downtime of containers at the night instead<br>

of unexpected and very painful kernel panics and<br>

complete reboots in the middle of the working day.<br>

(with data lost, data corruption and other &quot;amenities&quot;)<br>

</blockquote>

<br>

Unlike the previous item, which is valid, this is pure FUD.<br>

</blockquote>

<br>

Compare two situations:<br>

<br>

1) Live migration not used at all<br>

<br>

2) Live migration used and containers migrated between HN<br>

<br>

In which situation possibility to obtain kernel panic is higher?<br>

<br>

If you say &quot;possibility are equals&quot; this means<br>

what OpenVZ live migration code has no errors at all.<br>

<br>

Is it feasible? Especially if you see OpenVZ live migration<br>

code volume, code complexity and grandiosity if this task.<br>

<br>

If you say &quot;for (1) possibility is lower and for (2)<br>

possibility is higher&quot; - this is the same what I think.<br>

<br>

I don&#39;t use live migration because I don&#39;t want kernel panics.<br>

</blockquote>

<br></span>

Following your logic, if you don&#39;t want kernel panics, you might want<br>

to not use advanced filesystems such as ZFS, not use containers,<br>

cgroups, namespaces, etc. The ultimate solution here, of course,<br>

is to not use the kernel at all -- this will totally guarantee no kernel<br>

panics at all, ever.<br>

<br>

On a serious note, I find your logic flawed.<span class=""><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

And you say what &quot;this is pure FUD&quot; ? Why?<br>

</blockquote>

<br></span>

Because it is not based on your experience or correct statistics,<br>

but rather on something you saw on Google followed by some<br>

flawed logic.<div><div class="h5"><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

4) from technical point of view - it is possible<br>

to do live migration using ZFS, so &quot;live migration&quot;<br>

currently is only one advantage of ploop over ZFS<br>

</blockquote>

<br>

I wouldn&#39;t say so. If you have some real world comparison<br>

of zfs vs ploop, feel free to share. Like density or performance<br>

measurements, done in a controlled environment.<br>

</blockquote>

<br>

Ok.<br>

<br>

My experience with ploop:<br>

<br>

DISKSPACE limited to 256 GiB, real data used inside container<br>

was near 40-50% of limit 256 GiB, but ploop image is lot bigger,<br>

it use near 256 GiB of space at hardware node. Overhead ~ 50-60%<br>

<br>

I found workaround for this: run &quot;/usr/sbin/vzctl compact $CT&quot;<br>

via cron every night, and now ploop image has less overhead.<br>

<br>

current state:<br>

<br>

on hardware node:<br>

<br>

# du -b /vz/private/155/root.hdd<br>

205963399961    /vz/private/155/root.hdd<br>

<br>

inside container:<br>

<br>

# df -B1<br>

Filesystem               1B-blocks          Used    Available Use% Mounted on<br>

/dev/ploop38149p1     270426705920  163129053184  94928560128  64% /<br>

<br>

====================================<br>

<br>

used space, bytes: 163129053184<br>

<br>

image size, bytes: 205963399961<br>

<br>

&quot;ext4 over ploop over ext4&quot; solution disk space overhead is near 26%,<br>

or is near 40 GiB, if see this disk space overhead in absolute numbers.<br>

<br>

This is main disadvantage of ploop.<br>

<br>

And this disadvantage can&#39;t be avoided - it is &quot;by design&quot;.<br>

</blockquote>

<br></div></div>

To anyone reading this, there are a few things here worth noting.<br>

<br>

a. Such overhead is caused by three things:<br>

1. creating then removing data (vzctl compact takes care of that)<br>

2. filesystem fragmentation (we have some experimental patches to ext4<br>

    plus an ext4 defragmenter to solve it, but currently it&#39;s still in research stage)<br>

3. initial filesystem layout (which depends on initial ext4 fs size, including inode requirement)<br>

<br>

So, #1 is solved, #2 is solvable, and #3 is a limitation of the used file system and can me mitigated<br>

by properly choosing initial size of a newly created ploop.<br>

<br>

A example of #3 effect is this: if you create a very large filesystem initially (say, 16TB) and then<br>

downsize it (say, to 1TB), filesystem metadata overhead will be quite big. Same thing happens<br>

if you ask for lots of inodes (here &quot;lots&quot; means more than a default value which is 1 inode<br>

per 16K of disk space). This happens because ext4 filesystem is not designed to shrink.<br>

Therefore, to have lowest possible overhead you have to choose the initial filesystem size<br>

carefully. Yes, this is not a solution but a workaround.<br>

<br>

Also note, that ploop was not designed with any specific filesystem in mind, it is<br>

universal, so #3 can be solved by moving to a different fs in the future.<br>

<br>

Next thing, you can actually use shared base deltas for containers, and although it is not<br>

enabled by default, but quite possible and works in practice. The key is to create a base delta<br>

and use it for multiple containers (via hardlinks).<br>

<br>

Here is a quick and dirty example:<br>

<br>

SRCID=50 # &quot;Donor&quot; container ID<br>

vztmpl-dl centos-7-x86_64 # to make sure we use the latest<br>

vzctl create $SRCID --ostemplate centos-7-x86_64<br>

vzctl snapshot $SRCID<br>

for CT in $(seq 1000 2000); do \<br>

      mkdir -p /vz/private/$CT/root.hdd /vz/root/$CT; \<br>

      ln /vz/private/$SRCID/root.hdd/root.hdd /vz/private/$CT/root.hdd/root.hdd; \<br>

      cp -nr /vz/private/$SRCID/root.hdd /vz/private/$CT/; \<br>

      cp /etc/vz/conf/$SRCID.conf /etc/vz/conf/$CT.conf; \<br>

   done<br>

vzctl set $SRCID --disabled yes --save # make sure we don&#39;t use it<br>

<br>

This will create 1000 containers (so make sure your host have enough RAM),<br>

each having about 650MB files, so 650GB in total. Host disk space used will be<br>

about 650 + 1000*1 MB before start (i.e. about 2GB) , or about 650 + 1000*30 MB<br>

after start (i.e. about 32GB). So:<br>

<br>

real data used inside containers near 650 GB<br>

real space used on hard disk is near 32 GB<br>

<br>

So, 20x disk space savings, and this result is reproducible. Surely it will get worse<br>

over time etc., and this way of using plooop is neither official nor supported/recommended,<br>

but it&#39;s not the point here. The points are:<br>

 - this is a demonstration of what you could do with ploop<br>

 - this shows why you shouldn&#39;t trust any numbers<span class=""><br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

=======================================================================<br>

<br>

My experience with ZFS:<br>

<br>

real data used inside container near 62 GiB,<br>

real space used on hard disk is near 11 GiB.<br>

</blockquote>

<br></span>

So, you are not even comparing apples to apples here. You just took two<br>

different containers, certainly of different sizes, probably also different data sets<br>

and usage history. Not saying it&#39;s invalid, but if you want to have a meaningful<br>

(rather than anecdotal) comparison, you need to use same data sets, same<br>

operations on data etc., try to optimize each case, and compare<div class="HOEnZb"><div class="h5"><br>

<br>

<br>

<br>

_______________________________________________<br>

Users mailing list<br>

<a href="mailto:Users@openvz.org" target="_blank">Users@openvz.org</a><br>

<a href="https://lists.openvz.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.openvz.org/mailman/listinfo/users</a><br>

</div></div></blockquote></div><br></div>