<div dir="ltr">26G used is CTID and 43G root.hdd after vzctl compact are not the expected behavior or not?<br><div>Try to report it?</div><div><br></div><div>About big load to disks - I not seem what we can do with it, it look like expected behavior for compact, unfortunatly.</div><div><br></div><div>Why you are ignoring my arguments about ext4? It&#39;s filesystem from the history and it haven&#39;t so much modern features which &quot;must&quot; for file systems of 21 century. <br></div><div><span style="font-weight:bold;color:#cc0000"> </span>For example - Do ext4 have compression? Do ext4 have deduplication? Do ext4 have self-healing (or you want do checkout for mission-critcical data for hours) ?<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-07-23 16:22 GMT+03:00 Сергей Мамонов <span dir="ltr">&lt;<a href="mailto:mrqwer88@gmail.com" target="_blank">mrqwer88@gmail.com</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">And many added to bugzilla. And many already fixed from you and other guys from OpenVZ team.<div>But the all picture, unfortunately, it has not changed cardinally, yet. Some people afraid use it, yet.<br></div><div><br></div><div>PS And suspend container failed without iptables-save since 2007 year )</div><div><a href="https://bugzilla.openvz.org/show_bug.cgi?id=3154" target="_blank">https://bugzilla.openvz.org/show_bug.cgi?id=3154</a><br></div><div>When with not exist ip6tables-save it work correctly.</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">2015-07-23 10:39 GMT+03:00 Kir Kolyshkin <span dir="ltr">&lt;<a href="mailto:kir@openvz.org" target="_blank">kir@openvz.org</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF"><div><div>
    On 07/22/2015 11:59 PM, Сергей Мамонов wrote:<br>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div>
          <div>
            <div>
              <div>
                <div>
                  <div>
                    <div>
                      <div>
                        <div>
                          &gt;1. creating then removing data (vzctl
                          compact takes care of that)<br>
                          &gt;So, #1 is solved<br>
                          <br>
                          <span lang="en"><span>Only partially in fact.<br>
                            </span></span></div>
                        <span lang="en"><span>1. Compact &quot;eat&quot;</span></span><span lang="en"><span><span lang="en"><span> a lot of
                                resources</span><span>, </span></span></span></span><span lang="en"><span>because of the</span> <span>heavy use of</span> <span>the
                            disk.<br>
                          </span></span></div>
                      <span lang="en"><span>2. You need compact your ploop very
                          very regulary.<br>
                          <br>
                        </span></span></div>
                    On our nodes, when we run compact every day, with
                    3-5T /vz/ daily delta about 4-20% of space!<br>
                  </div>
                  Every day it must clean 300 - 500+ Gb.<br>
                  <br>
                </div>
                And it clean not all, as example - <br>
                <br>
                <div><span></span></div>
                [root@evo12 ~]# vzctl compact 75685<br>
                Trying to find free extents bigger than 0 bytes<br>
                Waiting<br>
                Call FITRIM, for minlen=33554432<br>
                Call FITRIM, for minlen=16777216<br>
                Call FITRIM, for minlen=8388608<br>
                Call FITRIM, for minlen=4194304<br>
                Call FITRIM, for minlen=2097152<br>
                Call FITRIM, for minlen=1048576<br>
                0 clusters have been relocated<br>
                [root@evo12 ~]# ls -lhat
                /vz/private/75685/root.hdd/root.hdd<br>
                -rw------- 1 root root 43G Июл 20 20:45
                /vz/private/75685/root.hdd/root.hdd<br>
                [root@evo12 ~]# vzctl exec 75685 df -h /<br>
                Filesystem         Size  Used Avail Use% Mounted on<br>
                /dev/ploop32178p1   50G   26G   21G  56% /<br>
                [root@evo12 ~]# vzctl --version<br>
                vzctl version 4.9.2<br>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br></div></div>
    This is either #2 or #3 from my list, or both.<span><br>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div>
          <div>
            <div>
              <div><br>
                &gt;My point was, the feature works fine for many people
                despite this bug.<span><br>
                </span></div>
              <span><br>
              </span></div>
            <span>Not fine, but we need it very much for
              migration and not. So anyway whe use it, </span><span lang="en"><span>we have no alternative in fact.<br>
              </span></span></div>
          <span lang="en"><span>And it one of bugs. Live migration regulary
              failed, because vzctl cannot restore container correctly
              after suspend.<br>
            </span></span></div>
      </div>
    </blockquote>
    <br></span>
    You really need to file bugs in case you want fixes.<div><div><br>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div><span lang="en"><span>Cpt is pain in fact. But I want to belive, that
              CRIU fix everything =)<br>
              <br>
            </span></span></div>
        <div><span lang="en"><span>And ext4 only with ploop - not good  case, and
              not modern case too.<br>
            </span></span></div>
        <div><span lang="en"><span>As example on some big nodes we have few /vz/
              partition, because raid controller cannot push all disk in
              one raid10 logical device. And few /vz/ partition </span></span><span lang="en"><span>it is not comfortable. </span></span><br>
          <span lang="en"><span>And it is</span> <span>less flexible
              like one zpool as exapmle.<br>
            </span></span></div>
        <span lang="en"><span></span></span>
        <div>
          <div>
            <div>
              <div>
                <div><br>
                  <div>
                    <div>
                      <div><span lang="en"></span>
                        <table>
                          <tbody>
                            <tr>
                              <td style="width:100%"><br>
                              </td>
                            </tr>
                          </tbody>
                        </table>
                        <div>
                          <div>
                            <div><span lang="en"><span></span></span><span lang="en"><span></span></span><span lang="en"><span></span></span></div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">2015-07-23 5:44 GMT+03:00 Kir Kolyshkin
          <span dir="ltr">&lt;<a href="mailto:kir@openvz.org" target="_blank">kir@openvz.org</a>&gt;</span>:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span><br>
              <br>
              On 07/22/2015 10:08 AM, Gena Makhomed wrote:<br>
              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                On 22.07.2015 8:39, Kir Kolyshkin wrote:<br>
                <br>
                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    1) currently even suspend/resume not work reliable:<br>
                    <a href="https://bugzilla.openvz.org/show_bug.cgi?id=2470" rel="noreferrer" target="_blank">https://bugzilla.openvz.org/show_bug.cgi?id=2470</a><br>
                    - I can&#39;t suspend and resume containers without
                    bugs.<br>
                    and as result - I also can&#39;t use it for live
                    migration.<br>
                  </blockquote>
                  <br>
                  Valid point, we need to figure it out. What I don&#39;t
                  understand<br>
                  is how lots of users are enjoying live migration
                  despite this bug.<br>
                  Me, personally, I never came across this.<br>
                </blockquote>
                <br>
                Nevertheless, steps to 100% reproduce bug provided in
                bugreport.<br>
              </blockquote>
              <br>
            </span>
            I was not saying anything about the bug report being
            bad/incomplete.<br>
            My point was, the feature works fine for many people despite
            this bug.<span><br>
              <br>
              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <br>
                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    2) I see in google many bugreports about this
                    feature:<br>
                    &quot;openvz live migration kernel panic&quot; - so I prefer
                    make<br>
                    planned downtime of containers at the night instead<br>
                    of unexpected and very painful kernel panics and<br>
                    complete reboots in the middle of the working day.<br>
                    (with data lost, data corruption and other
                    &quot;amenities&quot;)<br>
                  </blockquote>
                  <br>
                  Unlike the previous item, which is valid, this is pure
                  FUD.<br>
                </blockquote>
                <br>
                Compare two situations:<br>
                <br>
                1) Live migration not used at all<br>
                <br>
                2) Live migration used and containers migrated between
                HN<br>
                <br>
                In which situation possibility to obtain kernel panic is
                higher?<br>
                <br>
                If you say &quot;possibility are equals&quot; this means<br>
                what OpenVZ live migration code has no errors at all.<br>
                <br>
                Is it feasible? Especially if you see OpenVZ live
                migration<br>
                code volume, code complexity and grandiosity if this
                task.<br>
                <br>
                If you say &quot;for (1) possibility is lower and for (2)<br>
                possibility is higher&quot; - this is the same what I think.<br>
                <br>
                I don&#39;t use live migration because I don&#39;t want kernel
                panics.<br>
              </blockquote>
              <br>
            </span>
            Following your logic, if you don&#39;t want kernel panics, you
            might want<br>
            to not use advanced filesystems such as ZFS, not use
            containers,<br>
            cgroups, namespaces, etc. The ultimate solution here, of
            course,<br>
            is to not use the kernel at all -- this will totally
            guarantee no kernel<br>
            panics at all, ever.<br>
            <br>
            On a serious note, I find your logic flawed.<span><br>
              <br>
              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <br>
                And you say what &quot;this is pure FUD&quot; ? Why?<br>
              </blockquote>
              <br>
            </span>
            Because it is not based on your experience or correct
            statistics,<br>
            but rather on something you saw on Google followed by some<br>
            flawed logic.
            <div>
              <div><br>
                <br>
                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <br>
                  <br>
                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      4) from technical point of view - it is possible<br>
                      to do live migration using ZFS, so &quot;live
                      migration&quot;<br>
                      currently is only one advantage of ploop over ZFS<br>
                    </blockquote>
                    <br>
                    I wouldn&#39;t say so. If you have some real world
                    comparison<br>
                    of zfs vs ploop, feel free to share. Like density or
                    performance<br>
                    measurements, done in a controlled environment.<br>
                  </blockquote>
                  <br>
                  Ok.<br>
                  <br>
                  My experience with ploop:<br>
                  <br>
                  DISKSPACE limited to 256 GiB, real data used inside
                  container<br>
                  was near 40-50% of limit 256 GiB, but ploop image is
                  lot bigger,<br>
                  it use near 256 GiB of space at hardware node.
                  Overhead ~ 50-60%<br>
                  <br>
                  I found workaround for this: run &quot;/usr/sbin/vzctl
                  compact $CT&quot;<br>
                  via cron every night, and now ploop image has less
                  overhead.<br>
                  <br>
                  current state:<br>
                  <br>
                  on hardware node:<br>
                  <br>
                  # du -b /vz/private/155/root.hdd<br>
                  205963399961    /vz/private/155/root.hdd<br>
                  <br>
                  inside container:<br>
                  <br>
                  # df -B1<br>
                  Filesystem               1B-blocks          Used   
                  Available Use% Mounted on<br>
                  /dev/ploop38149p1     270426705920  163129053184 
                  94928560128  64% /<br>
                  <br>
                  ====================================<br>
                  <br>
                  used space, bytes: 163129053184<br>
                  <br>
                  image size, bytes: 205963399961<br>
                  <br>
                  &quot;ext4 over ploop over ext4&quot; solution disk space
                  overhead is near 26%,<br>
                  or is near 40 GiB, if see this disk space overhead in
                  absolute numbers.<br>
                  <br>
                  This is main disadvantage of ploop.<br>
                  <br>
                  And this disadvantage can&#39;t be avoided - it is &quot;by
                  design&quot;.<br>
                </blockquote>
                <br>
              </div>
            </div>
            To anyone reading this, there are a few things here worth
            noting.<br>
            <br>
            a. Such overhead is caused by three things:<br>
            1. creating then removing data (vzctl compact takes care of
            that)<br>
            2. filesystem fragmentation (we have some experimental
            patches to ext4<br>
                plus an ext4 defragmenter to solve it, but currently
            it&#39;s still in research stage)<br>
            3. initial filesystem layout (which depends on initial ext4
            fs size, including inode requirement)<br>
            <br>
            So, #1 is solved, #2 is solvable, and #3 is a limitation of
            the used file system and can me mitigated<br>
            by properly choosing initial size of a newly created ploop.<br>
            <br>
            A example of #3 effect is this: if you create a very large
            filesystem initially (say, 16TB) and then<br>
            downsize it (say, to 1TB), filesystem metadata overhead will
            be quite big. Same thing happens<br>
            if you ask for lots of inodes (here &quot;lots&quot; means more than a
            default value which is 1 inode<br>
            per 16K of disk space). This happens because ext4 filesystem
            is not designed to shrink.<br>
            Therefore, to have lowest possible overhead you have to
            choose the initial filesystem size<br>
            carefully. Yes, this is not a solution but a workaround.<br>
            <br>
            Also note, that ploop was not designed with any specific
            filesystem in mind, it is<br>
            universal, so #3 can be solved by moving to a different fs
            in the future.<br>
            <br>
            Next thing, you can actually use shared base deltas for
            containers, and although it is not<br>
            enabled by default, but quite possible and works in
            practice. The key is to create a base delta<br>
            and use it for multiple containers (via hardlinks).<br>
            <br>
            Here is a quick and dirty example:<br>
            <br>
            SRCID=50 # &quot;Donor&quot; container ID<br>
            vztmpl-dl centos-7-x86_64 # to make sure we use the latest<br>
            vzctl create $SRCID --ostemplate centos-7-x86_64<br>
            vzctl snapshot $SRCID<br>
            for CT in $(seq 1000 2000); do \<br>
                  mkdir -p /vz/private/$CT/root.hdd /vz/root/$CT; \<br>
                  ln /vz/private/$SRCID/root.hdd/root.hdd
            /vz/private/$CT/root.hdd/root.hdd; \<br>
                  cp -nr /vz/private/$SRCID/root.hdd /vz/private/$CT/; \<br>
                  cp /etc/vz/conf/$SRCID.conf /etc/vz/conf/$CT.conf; \<br>
               done<br>
            vzctl set $SRCID --disabled yes --save # make sure we don&#39;t
            use it<br>
            <br>
            This will create 1000 containers (so make sure your host
            have enough RAM),<br>
            each having about 650MB files, so 650GB in total. Host disk
            space used will be<br>
            about 650 + 1000*1 MB before start (i.e. about 2GB) , or
            about 650 + 1000*30 MB<br>
            after start (i.e. about 32GB). So:<br>
            <br>
            real data used inside containers near 650 GB<br>
            real space used on hard disk is near 32 GB<br>
            <br>
            So, 20x disk space savings, and this result is reproducible.
            Surely it will get worse<br>
            over time etc., and this way of using plooop is neither
            official nor supported/recommended,<br>
            but it&#39;s not the point here. The points are:<br>
             - this is a demonstration of what you could do with ploop<br>
             - this shows why you shouldn&#39;t trust any numbers<span><br>
              <br>
              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
=======================================================================<br>
                <br>
                My experience with ZFS:<br>
                <br>
                real data used inside container near 62 GiB,<br>
                real space used on hard disk is near 11 GiB.<br>
              </blockquote>
              <br>
            </span>
            So, you are not even comparing apples to apples here. You
            just took two<br>
            different containers, certainly of different sizes, probably
            also different data sets<br>
            and usage history. Not saying it&#39;s invalid, but if you want
            to have a meaningful<br>
            (rather than anecdotal) comparison, you need to use same
            data sets, same<br>
            operations on data etc., try to optimize each case, and
            compare
            <div>
              <div><br>
                <br>
                <br>
                <br>
                _______________________________________________<br>
                Users mailing list<br>
                <a href="mailto:Users@openvz.org" target="_blank">Users@openvz.org</a><br>
                <a href="https://lists.openvz.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.openvz.org/mailman/listinfo/users</a><br>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
      <br>
      <fieldset></fieldset>
      <br>
      <pre>_______________________________________________
Users mailing list
<a href="mailto:Users@openvz.org" target="_blank">Users@openvz.org</a>
<a href="https://lists.openvz.org/mailman/listinfo/users" target="_blank">https://lists.openvz.org/mailman/listinfo/users</a>
</pre>
    </blockquote>
    <br>
  </div></div></div>

<br>_______________________________________________<br>
Users mailing list<br>
<a href="mailto:Users@openvz.org" target="_blank">Users@openvz.org</a><br>
<a href="https://lists.openvz.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.openvz.org/mailman/listinfo/users</a><br>
<br></blockquote></div><br></div>
</div></div></blockquote></div><br></div>