<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    On 07/22/2015 11:59 PM, Сергей Мамонов wrote:<br>
    <blockquote
cite="mid:CAG2oxtqCTVxSbiCzX6mB9qjhjU3fxVTi5FHR-Wr6cEvByddyzg@mail.gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div>
          <div>
            <div>
              <div>
                <div>
                  <div>
                    <div>
                      <div>
                        <div>
                          &gt;1. creating then removing data (vzctl
                          compact takes care of that)<br>
                          &gt;So, #1 is solved<br>
                          <br>
                          <span id="result_box" class="" lang="en"><span
                              class="">Only partially in fact.<br>
                            </span></span></div>
                        <span id="result_box" class="" lang="en"><span
                            class="">1. Compact "eat"</span></span><span
                          id="result_box" class="" lang="en"><span
                            class=""><span id="result_box" class=""
                              lang="en"><span class=""> a lot of
                                resources</span><span class="">, </span></span></span></span><span
                          id="result_box" class="" lang="en"><span
                            class="">because of the</span> <span
                            class="">heavy use of</span> <span class="">the
                            disk.<br>
                          </span></span></div>
                      <span id="result_box" class="" lang="en"><span
                          class="">2. You need compact your ploop very
                          very regulary.<br>
                          <br>
                        </span></span></div>
                    On our nodes, when we run compact every day, with
                    3-5T /vz/ daily delta about 4-20% of space!<br>
                  </div>
                  Every day it must clean 300 - 500+ Gb.<br>
                  <br>
                </div>
                And it clean not all, as example - <br>
                <br>
                <div style="" class=""><span class=""></span></div>
                [root@evo12 ~]# vzctl compact 75685<br>
                Trying to find free extents bigger than 0 bytes<br>
                Waiting<br>
                Call FITRIM, for minlen=33554432<br>
                Call FITRIM, for minlen=16777216<br>
                Call FITRIM, for minlen=8388608<br>
                Call FITRIM, for minlen=4194304<br>
                Call FITRIM, for minlen=2097152<br>
                Call FITRIM, for minlen=1048576<br>
                0 clusters have been relocated<br>
                [root@evo12 ~]# ls -lhat
                /vz/private/75685/root.hdd/root.hdd<br>
                -rw------- 1 root root 43G Июл 20 20:45
                /vz/private/75685/root.hdd/root.hdd<br>
                [root@evo12 ~]# vzctl exec 75685 df -h /<br>
                Filesystem         Size  Used Avail Use% Mounted on<br>
                /dev/ploop32178p1   50G   26G   21G  56% /<br>
                [root@evo12 ~]# vzctl --version<br>
                vzctl version 4.9.2<br>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    This is either #2 or #3 from my list, or both.<br>
    <br>
    <blockquote
cite="mid:CAG2oxtqCTVxSbiCzX6mB9qjhjU3fxVTi5FHR-Wr6cEvByddyzg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>
          <div>
            <div>
              <div><br>
                &gt;My point was, the feature works fine for many people
                despite this bug.<span class="im"><br>
                </span></div>
              <span class="im"><br>
              </span></div>
            <span class="im">Not fine, but we need it very much for
              migration and not. So anyway whe use it, </span><span
              tabindex="-1" id="result_box" class="" lang="en"><span
                class="">we have no alternative in fact.<br>
              </span></span></div>
          <span tabindex="-1" id="result_box" class="" lang="en"><span
              class="">And it one of bugs. Live migration regulary
              failed, because vzctl cannot restore container correctly
              after suspend.<br>
            </span></span></div>
      </div>
    </blockquote>
    <br>
    You really need to file bugs in case you want fixes.<br>
    <br>
    <blockquote
cite="mid:CAG2oxtqCTVxSbiCzX6mB9qjhjU3fxVTi5FHR-Wr6cEvByddyzg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div><span tabindex="-1" id="result_box" class="" lang="en"><span
              class="">Cpt is pain in fact. But I want to belive, that
              CRIU fix everything =)<br>
              <br>
            </span></span></div>
        <div><span tabindex="-1" id="result_box" class="" lang="en"><span
              class="">And ext4 only with ploop - not good  case, and
              not modern case too.<br>
            </span></span></div>
        <div><span tabindex="-1" id="result_box" class="" lang="en"><span
              class="">As example on some big nodes we have few /vz/
              partition, because raid controller cannot push all disk in
              one raid10 logical device. And few /vz/ partition </span></span><span
            tabindex="-1" id="result_box" class="" lang="en"><span
              class="">it is not comfortable. </span></span><br>
          <span tabindex="-1" id="result_box" class="" lang="en"><span
              class="">And it is</span> <span class="">less flexible
              like one zpool as exapmle.<br>
            </span></span></div>
        <span tabindex="-1" id="result_box" class="" lang="en"><span
            class=""></span></span>
        <div>
          <div>
            <div>
              <div>
                <div><br>
                  <div>
                    <div>
                      <div><span id="result_box" class="" lang="en"></span>
                        <table class="">
                          <tbody>
                            <tr>
                              <td style="width:100%"><br>
                              </td>
                            </tr>
                          </tbody>
                        </table>
                        <div>
                          <div>
                            <div><span id="result_box" class=""
                                lang="en"><span class=""></span></span><span
                                id="result_box" class="" lang="en"><span
                                  class=""></span></span><span
                                id="result_box" class="" lang="en"><span
                                  class=""></span></span></div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">2015-07-23 5:44 GMT+03:00 Kir Kolyshkin
          <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:kir@openvz.org" target="_blank">kir@openvz.org</a>&gt;</span>:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
              class=""><br>
              <br>
              On 07/22/2015 10:08 AM, Gena Makhomed wrote:<br>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                On 22.07.2015 8:39, Kir Kolyshkin wrote:<br>
                <br>
                <blockquote class="gmail_quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    1) currently even suspend/resume not work reliable:<br>
                    <a moz-do-not-send="true"
                      href="https://bugzilla.openvz.org/show_bug.cgi?id=2470"
                      rel="noreferrer" target="_blank">https://bugzilla.openvz.org/show_bug.cgi?id=2470</a><br>
                    - I can't suspend and resume containers without
                    bugs.<br>
                    and as result - I also can't use it for live
                    migration.<br>
                  </blockquote>
                  <br>
                  Valid point, we need to figure it out. What I don't
                  understand<br>
                  is how lots of users are enjoying live migration
                  despite this bug.<br>
                  Me, personally, I never came across this.<br>
                </blockquote>
                <br>
                Nevertheless, steps to 100% reproduce bug provided in
                bugreport.<br>
              </blockquote>
              <br>
            </span>
            I was not saying anything about the bug report being
            bad/incomplete.<br>
            My point was, the feature works fine for many people despite
            this bug.<span class=""><br>
              <br>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <br>
                <blockquote class="gmail_quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    2) I see in google many bugreports about this
                    feature:<br>
                    "openvz live migration kernel panic" - so I prefer
                    make<br>
                    planned downtime of containers at the night instead<br>
                    of unexpected and very painful kernel panics and<br>
                    complete reboots in the middle of the working day.<br>
                    (with data lost, data corruption and other
                    "amenities")<br>
                  </blockquote>
                  <br>
                  Unlike the previous item, which is valid, this is pure
                  FUD.<br>
                </blockquote>
                <br>
                Compare two situations:<br>
                <br>
                1) Live migration not used at all<br>
                <br>
                2) Live migration used and containers migrated between
                HN<br>
                <br>
                In which situation possibility to obtain kernel panic is
                higher?<br>
                <br>
                If you say "possibility are equals" this means<br>
                what OpenVZ live migration code has no errors at all.<br>
                <br>
                Is it feasible? Especially if you see OpenVZ live
                migration<br>
                code volume, code complexity and grandiosity if this
                task.<br>
                <br>
                If you say "for (1) possibility is lower and for (2)<br>
                possibility is higher" - this is the same what I think.<br>
                <br>
                I don't use live migration because I don't want kernel
                panics.<br>
              </blockquote>
              <br>
            </span>
            Following your logic, if you don't want kernel panics, you
            might want<br>
            to not use advanced filesystems such as ZFS, not use
            containers,<br>
            cgroups, namespaces, etc. The ultimate solution here, of
            course,<br>
            is to not use the kernel at all -- this will totally
            guarantee no kernel<br>
            panics at all, ever.<br>
            <br>
            On a serious note, I find your logic flawed.<span class=""><br>
              <br>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <br>
                And you say what "this is pure FUD" ? Why?<br>
              </blockquote>
              <br>
            </span>
            Because it is not based on your experience or correct
            statistics,<br>
            but rather on something you saw on Google followed by some<br>
            flawed logic.
            <div>
              <div class="h5"><br>
                <br>
                <blockquote class="gmail_quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  <br>
                  <br>
                  <blockquote class="gmail_quote" style="margin:0 0 0
                    .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    <blockquote class="gmail_quote" style="margin:0 0 0
                      .8ex;border-left:1px #ccc solid;padding-left:1ex">
                      4) from technical point of view - it is possible<br>
                      to do live migration using ZFS, so "live
                      migration"<br>
                      currently is only one advantage of ploop over ZFS<br>
                    </blockquote>
                    <br>
                    I wouldn't say so. If you have some real world
                    comparison<br>
                    of zfs vs ploop, feel free to share. Like density or
                    performance<br>
                    measurements, done in a controlled environment.<br>
                  </blockquote>
                  <br>
                  Ok.<br>
                  <br>
                  My experience with ploop:<br>
                  <br>
                  DISKSPACE limited to 256 GiB, real data used inside
                  container<br>
                  was near 40-50% of limit 256 GiB, but ploop image is
                  lot bigger,<br>
                  it use near 256 GiB of space at hardware node.
                  Overhead ~ 50-60%<br>
                  <br>
                  I found workaround for this: run "/usr/sbin/vzctl
                  compact $CT"<br>
                  via cron every night, and now ploop image has less
                  overhead.<br>
                  <br>
                  current state:<br>
                  <br>
                  on hardware node:<br>
                  <br>
                  # du -b /vz/private/155/root.hdd<br>
                  205963399961    /vz/private/155/root.hdd<br>
                  <br>
                  inside container:<br>
                  <br>
                  # df -B1<br>
                  Filesystem               1B-blocks          Used   
                  Available Use% Mounted on<br>
                  /dev/ploop38149p1     270426705920  163129053184 
                  94928560128  64% /<br>
                  <br>
                  ====================================<br>
                  <br>
                  used space, bytes: 163129053184<br>
                  <br>
                  image size, bytes: 205963399961<br>
                  <br>
                  "ext4 over ploop over ext4" solution disk space
                  overhead is near 26%,<br>
                  or is near 40 GiB, if see this disk space overhead in
                  absolute numbers.<br>
                  <br>
                  This is main disadvantage of ploop.<br>
                  <br>
                  And this disadvantage can't be avoided - it is "by
                  design".<br>
                </blockquote>
                <br>
              </div>
            </div>
            To anyone reading this, there are a few things here worth
            noting.<br>
            <br>
            a. Such overhead is caused by three things:<br>
            1. creating then removing data (vzctl compact takes care of
            that)<br>
            2. filesystem fragmentation (we have some experimental
            patches to ext4<br>
                plus an ext4 defragmenter to solve it, but currently
            it's still in research stage)<br>
            3. initial filesystem layout (which depends on initial ext4
            fs size, including inode requirement)<br>
            <br>
            So, #1 is solved, #2 is solvable, and #3 is a limitation of
            the used file system and can me mitigated<br>
            by properly choosing initial size of a newly created ploop.<br>
            <br>
            A example of #3 effect is this: if you create a very large
            filesystem initially (say, 16TB) and then<br>
            downsize it (say, to 1TB), filesystem metadata overhead will
            be quite big. Same thing happens<br>
            if you ask for lots of inodes (here "lots" means more than a
            default value which is 1 inode<br>
            per 16K of disk space). This happens because ext4 filesystem
            is not designed to shrink.<br>
            Therefore, to have lowest possible overhead you have to
            choose the initial filesystem size<br>
            carefully. Yes, this is not a solution but a workaround.<br>
            <br>
            Also note, that ploop was not designed with any specific
            filesystem in mind, it is<br>
            universal, so #3 can be solved by moving to a different fs
            in the future.<br>
            <br>
            Next thing, you can actually use shared base deltas for
            containers, and although it is not<br>
            enabled by default, but quite possible and works in
            practice. The key is to create a base delta<br>
            and use it for multiple containers (via hardlinks).<br>
            <br>
            Here is a quick and dirty example:<br>
            <br>
            SRCID=50 # "Donor" container ID<br>
            vztmpl-dl centos-7-x86_64 # to make sure we use the latest<br>
            vzctl create $SRCID --ostemplate centos-7-x86_64<br>
            vzctl snapshot $SRCID<br>
            for CT in $(seq 1000 2000); do \<br>
                  mkdir -p /vz/private/$CT/root.hdd /vz/root/$CT; \<br>
                  ln /vz/private/$SRCID/root.hdd/root.hdd
            /vz/private/$CT/root.hdd/root.hdd; \<br>
                  cp -nr /vz/private/$SRCID/root.hdd /vz/private/$CT/; \<br>
                  cp /etc/vz/conf/$SRCID.conf /etc/vz/conf/$CT.conf; \<br>
               done<br>
            vzctl set $SRCID --disabled yes --save # make sure we don't
            use it<br>
            <br>
            This will create 1000 containers (so make sure your host
            have enough RAM),<br>
            each having about 650MB files, so 650GB in total. Host disk
            space used will be<br>
            about 650 + 1000*1 MB before start (i.e. about 2GB) , or
            about 650 + 1000*30 MB<br>
            after start (i.e. about 32GB). So:<br>
            <br>
            real data used inside containers near 650 GB<br>
            real space used on hard disk is near 32 GB<br>
            <br>
            So, 20x disk space savings, and this result is reproducible.
            Surely it will get worse<br>
            over time etc., and this way of using plooop is neither
            official nor supported/recommended,<br>
            but it's not the point here. The points are:<br>
             - this is a demonstration of what you could do with ploop<br>
             - this shows why you shouldn't trust any numbers<span
              class=""><br>
              <br>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
=======================================================================<br>
                <br>
                My experience with ZFS:<br>
                <br>
                real data used inside container near 62 GiB,<br>
                real space used on hard disk is near 11 GiB.<br>
              </blockquote>
              <br>
            </span>
            So, you are not even comparing apples to apples here. You
            just took two<br>
            different containers, certainly of different sizes, probably
            also different data sets<br>
            and usage history. Not saying it's invalid, but if you want
            to have a meaningful<br>
            (rather than anecdotal) comparison, you need to use same
            data sets, same<br>
            operations on data etc., try to optimize each case, and
            compare
            <div class="HOEnZb">
              <div class="h5"><br>
                <br>
                <br>
                <br>
                _______________________________________________<br>
                Users mailing list<br>
                <a moz-do-not-send="true" href="mailto:Users@openvz.org"
                  target="_blank">Users@openvz.org</a><br>
                <a moz-do-not-send="true"
                  href="https://lists.openvz.org/mailman/listinfo/users"
                  rel="noreferrer" target="_blank">https://lists.openvz.org/mailman/listinfo/users</a><br>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Users@openvz.org">Users@openvz.org</a>
<a class="moz-txt-link-freetext" href="https://lists.openvz.org/mailman/listinfo/users">https://lists.openvz.org/mailman/listinfo/users</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>