<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    On 07/22/2015 11:59 PM, Сергей Мамонов wrote:<br>

    <blockquote

cite="mid:CAG2oxtqCTVxSbiCzX6mB9qjhjU3fxVTi5FHR-Wr6cEvByddyzg@mail.gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div>

          <div>

            <div>

              <div>

                <div>

                  <div>

                    <div>

                      <div>

                        <div>

                          &gt;1. creating then removing data (vzctl

                          compact takes care of that)<br>

                          &gt;So, #1 is solved<br>

                          <br>

                          <span id="result_box" class="" lang="en"><span

                              class="">Only partially in fact.<br>

                            </span></span></div>

                        <span id="result_box" class="" lang="en"><span

                            class="">1. Compact "eat"</span></span><span

                          id="result_box" class="" lang="en"><span

                            class=""><span id="result_box" class=""

                              lang="en"><span class=""> a lot of

                                resources</span><span class="">, </span></span></span></span><span

                          id="result_box" class="" lang="en"><span

                            class="">because of the</span> <span

                            class="">heavy use of</span> <span class="">the

                            disk.<br>

                          </span></span></div>

                      <span id="result_box" class="" lang="en"><span

                          class="">2. You need compact your ploop very

                          very regulary.<br>

                          <br>

                        </span></span></div>

                    On our nodes, when we run compact every day, with

                    3-5T /vz/ daily delta about 4-20% of space!<br>

                  </div>

                  Every day it must clean 300 - 500+ Gb.<br>

                  <br>

                </div>

                And it clean not all, as example - <br>

                <br>

                <div style="" class=""><span class=""></span></div>

                [root@evo12 ~]# vzctl compact 75685<br>

                Trying to find free extents bigger than 0 bytes<br>

                Waiting<br>

                Call FITRIM, for minlen=33554432<br>

                Call FITRIM, for minlen=16777216<br>

                Call FITRIM, for minlen=8388608<br>

                Call FITRIM, for minlen=4194304<br>

                Call FITRIM, for minlen=2097152<br>

                Call FITRIM, for minlen=1048576<br>

                0 clusters have been relocated<br>

                [root@evo12 ~]# ls -lhat

                /vz/private/75685/root.hdd/root.hdd<br>

                -rw------- 1 root root 43G Июл 20 20:45

                /vz/private/75685/root.hdd/root.hdd<br>

                [root@evo12 ~]# vzctl exec 75685 df -h /<br>

                Filesystem         Size  Used Avail Use% Mounted on<br>

                /dev/ploop32178p1   50G   26G   21G  56% /<br>

                [root@evo12 ~]# vzctl --version<br>

                vzctl version 4.9.2<br>

              </div>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    This is either #2 or #3 from my list, or both.<br>

    <br>

    <blockquote

cite="mid:CAG2oxtqCTVxSbiCzX6mB9qjhjU3fxVTi5FHR-Wr6cEvByddyzg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>

          <div>

            <div>

              <div><br>

                &gt;My point was, the feature works fine for many people

                despite this bug.<span class="im"><br>

                </span></div>

              <span class="im"><br>

              </span></div>

            <span class="im">Not fine, but we need it very much for

              migration and not. So anyway whe use it, </span><span

              tabindex="-1" id="result_box" class="" lang="en"><span

                class="">we have no alternative in fact.<br>

              </span></span></div>

          <span tabindex="-1" id="result_box" class="" lang="en"><span

              class="">And it one of bugs. Live migration regulary

              failed, because vzctl cannot restore container correctly

              after suspend.<br>

            </span></span></div>

      </div>

    </blockquote>

    <br>

    You really need to file bugs in case you want fixes.<br>

    <br>

    <blockquote

cite="mid:CAG2oxtqCTVxSbiCzX6mB9qjhjU3fxVTi5FHR-Wr6cEvByddyzg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><span tabindex="-1" id="result_box" class="" lang="en"><span

              class="">Cpt is pain in fact. But I want to belive, that

              CRIU fix everything =)<br>

              <br>

            </span></span></div>

        <div><span tabindex="-1" id="result_box" class="" lang="en"><span

              class="">And ext4 only with ploop - not good  case, and

              not modern case too.<br>

            </span></span></div>

        <div><span tabindex="-1" id="result_box" class="" lang="en"><span

              class="">As example on some big nodes we have few /vz/

              partition, because raid controller cannot push all disk in

              one raid10 logical device. And few /vz/ partition </span></span><span

            tabindex="-1" id="result_box" class="" lang="en"><span

              class="">it is not comfortable. </span></span><br>

          <span tabindex="-1" id="result_box" class="" lang="en"><span

              class="">And it is</span> <span class="">less flexible

              like one zpool as exapmle.<br>

            </span></span></div>

        <span tabindex="-1" id="result_box" class="" lang="en"><span

            class=""></span></span>

        <div>

          <div>

            <div>

              <div>

                <div><br>

                  <div>

                    <div>

                      <div><span id="result_box" class="" lang="en"></span>

                        <table class="">

                          <tbody>

                            <tr>

                              <td style="width:100%"><br>

                              </td>

                            </tr>

                          </tbody>

                        </table>

                        <div>

                          <div>

                            <div><span id="result_box" class=""

                                lang="en"><span class=""></span></span><span

                                id="result_box" class="" lang="en"><span

                                  class=""></span></span><span

                                id="result_box" class="" lang="en"><span

                                  class=""></span></span></div>

                          </div>

                        </div>

                      </div>

                    </div>

                  </div>

                </div>

              </div>

            </div>

          </div>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">2015-07-23 5:44 GMT+03:00 Kir Kolyshkin

          <span dir="ltr">&lt;<a moz-do-not-send="true"

              href="mailto:kir@openvz.org" target="_blank">kir@openvz.org</a>&gt;</span>:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex"><span

              class=""><br>

              <br>

              On 07/22/2015 10:08 AM, Gena Makhomed wrote:<br>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                On 22.07.2015 8:39, Kir Kolyshkin wrote:<br>

                <br>

                <blockquote class="gmail_quote" style="margin:0 0 0

                  .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  <blockquote class="gmail_quote" style="margin:0 0 0

                    .8ex;border-left:1px #ccc solid;padding-left:1ex">

                    1) currently even suspend/resume not work reliable:<br>

                    <a moz-do-not-send="true"

                      href="https://bugzilla.openvz.org/show_bug.cgi?id=2470"

                      rel="noreferrer" target="_blank">https://bugzilla.openvz.org/show_bug.cgi?id=2470</a><br>

                    - I can't suspend and resume containers without

                    bugs.<br>

                    and as result - I also can't use it for live

                    migration.<br>

                  </blockquote>

                  <br>

                  Valid point, we need to figure it out. What I don't

                  understand<br>

                  is how lots of users are enjoying live migration

                  despite this bug.<br>

                  Me, personally, I never came across this.<br>

                </blockquote>

                <br>

                Nevertheless, steps to 100% reproduce bug provided in

                bugreport.<br>

              </blockquote>

              <br>

            </span>

            I was not saying anything about the bug report being

            bad/incomplete.<br>

            My point was, the feature works fine for many people despite

            this bug.<span class=""><br>

              <br>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <br>

                <blockquote class="gmail_quote" style="margin:0 0 0

                  .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  <blockquote class="gmail_quote" style="margin:0 0 0

                    .8ex;border-left:1px #ccc solid;padding-left:1ex">

                    2) I see in google many bugreports about this

                    feature:<br>

                    "openvz live migration kernel panic" - so I prefer

                    make<br>

                    planned downtime of containers at the night instead<br>

                    of unexpected and very painful kernel panics and<br>

                    complete reboots in the middle of the working day.<br>

                    (with data lost, data corruption and other

                    "amenities")<br>

                  </blockquote>

                  <br>

                  Unlike the previous item, which is valid, this is pure

                  FUD.<br>

                </blockquote>

                <br>

                Compare two situations:<br>

                <br>

                1) Live migration not used at all<br>

                <br>

                2) Live migration used and containers migrated between

                HN<br>

                <br>

                In which situation possibility to obtain kernel panic is

                higher?<br>

                <br>

                If you say "possibility are equals" this means<br>

                what OpenVZ live migration code has no errors at all.<br>

                <br>

                Is it feasible? Especially if you see OpenVZ live

                migration<br>

                code volume, code complexity and grandiosity if this

                task.<br>

                <br>

                If you say "for (1) possibility is lower and for (2)<br>

                possibility is higher" - this is the same what I think.<br>

                <br>

                I don't use live migration because I don't want kernel

                panics.<br>

              </blockquote>

              <br>

            </span>

            Following your logic, if you don't want kernel panics, you

            might want<br>

            to not use advanced filesystems such as ZFS, not use

            containers,<br>

            cgroups, namespaces, etc. The ultimate solution here, of

            course,<br>

            is to not use the kernel at all -- this will totally

            guarantee no kernel<br>

            panics at all, ever.<br>

            <br>

            On a serious note, I find your logic flawed.<span class=""><br>

              <br>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <br>

                And you say what "this is pure FUD" ? Why?<br>

              </blockquote>

              <br>

            </span>

            Because it is not based on your experience or correct

            statistics,<br>

            but rather on something you saw on Google followed by some<br>

            flawed logic.

            <div>

              <div class="h5"><br>

                <br>

                <blockquote class="gmail_quote" style="margin:0 0 0

                  .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  <br>

                  <br>

                  <blockquote class="gmail_quote" style="margin:0 0 0

                    .8ex;border-left:1px #ccc solid;padding-left:1ex">

                    <blockquote class="gmail_quote" style="margin:0 0 0

                      .8ex;border-left:1px #ccc solid;padding-left:1ex">

                      4) from technical point of view - it is possible<br>

                      to do live migration using ZFS, so "live

                      migration"<br>

                      currently is only one advantage of ploop over ZFS<br>

                    </blockquote>

                    <br>

                    I wouldn't say so. If you have some real world

                    comparison<br>

                    of zfs vs ploop, feel free to share. Like density or

                    performance<br>

                    measurements, done in a controlled environment.<br>

                  </blockquote>

                  <br>

                  Ok.<br>

                  <br>

                  My experience with ploop:<br>

                  <br>

                  DISKSPACE limited to 256 GiB, real data used inside

                  container<br>

                  was near 40-50% of limit 256 GiB, but ploop image is

                  lot bigger,<br>

                  it use near 256 GiB of space at hardware node.

                  Overhead ~ 50-60%<br>

                  <br>

                  I found workaround for this: run "/usr/sbin/vzctl

                  compact $CT"<br>

                  via cron every night, and now ploop image has less

                  overhead.<br>

                  <br>

                  current state:<br>

                  <br>

                  on hardware node:<br>

                  <br>

                  # du -b /vz/private/155/root.hdd<br>

                  205963399961    /vz/private/155/root.hdd<br>

                  <br>

                  inside container:<br>

                  <br>

                  # df -B1<br>

                  Filesystem               1B-blocks          Used   

                  Available Use% Mounted on<br>

                  /dev/ploop38149p1     270426705920  163129053184 

                  94928560128  64% /<br>

                  <br>

                  ====================================<br>

                  <br>

                  used space, bytes: 163129053184<br>

                  <br>

                  image size, bytes: 205963399961<br>

                  <br>

                  "ext4 over ploop over ext4" solution disk space

                  overhead is near 26%,<br>

                  or is near 40 GiB, if see this disk space overhead in

                  absolute numbers.<br>

                  <br>

                  This is main disadvantage of ploop.<br>

                  <br>

                  And this disadvantage can't be avoided - it is "by

                  design".<br>

                </blockquote>

                <br>

              </div>

            </div>

            To anyone reading this, there are a few things here worth

            noting.<br>

            <br>

            a. Such overhead is caused by three things:<br>

            1. creating then removing data (vzctl compact takes care of

            that)<br>

            2. filesystem fragmentation (we have some experimental

            patches to ext4<br>

                plus an ext4 defragmenter to solve it, but currently

            it's still in research stage)<br>

            3. initial filesystem layout (which depends on initial ext4

            fs size, including inode requirement)<br>

            <br>

            So, #1 is solved, #2 is solvable, and #3 is a limitation of

            the used file system and can me mitigated<br>

            by properly choosing initial size of a newly created ploop.<br>

            <br>

            A example of #3 effect is this: if you create a very large

            filesystem initially (say, 16TB) and then<br>

            downsize it (say, to 1TB), filesystem metadata overhead will

            be quite big. Same thing happens<br>

            if you ask for lots of inodes (here "lots" means more than a

            default value which is 1 inode<br>

            per 16K of disk space). This happens because ext4 filesystem

            is not designed to shrink.<br>

            Therefore, to have lowest possible overhead you have to

            choose the initial filesystem size<br>

            carefully. Yes, this is not a solution but a workaround.<br>

            <br>

            Also note, that ploop was not designed with any specific

            filesystem in mind, it is<br>

            universal, so #3 can be solved by moving to a different fs

            in the future.<br>

            <br>

            Next thing, you can actually use shared base deltas for

            containers, and although it is not<br>

            enabled by default, but quite possible and works in

            practice. The key is to create a base delta<br>

            and use it for multiple containers (via hardlinks).<br>

            <br>

            Here is a quick and dirty example:<br>

            <br>

            SRCID=50 # "Donor" container ID<br>

            vztmpl-dl centos-7-x86_64 # to make sure we use the latest<br>

            vzctl create $SRCID --ostemplate centos-7-x86_64<br>

            vzctl snapshot $SRCID<br>

            for CT in $(seq 1000 2000); do \<br>

                  mkdir -p /vz/private/$CT/root.hdd /vz/root/$CT; \<br>

                  ln /vz/private/$SRCID/root.hdd/root.hdd

            /vz/private/$CT/root.hdd/root.hdd; \<br>

                  cp -nr /vz/private/$SRCID/root.hdd /vz/private/$CT/; \<br>

                  cp /etc/vz/conf/$SRCID.conf /etc/vz/conf/$CT.conf; \<br>

               done<br>

            vzctl set $SRCID --disabled yes --save # make sure we don't

            use it<br>

            <br>

            This will create 1000 containers (so make sure your host

            have enough RAM),<br>

            each having about 650MB files, so 650GB in total. Host disk

            space used will be<br>

            about 650 + 1000*1 MB before start (i.e. about 2GB) , or

            about 650 + 1000*30 MB<br>

            after start (i.e. about 32GB). So:<br>

            <br>

            real data used inside containers near 650 GB<br>

            real space used on hard disk is near 32 GB<br>

            <br>

            So, 20x disk space savings, and this result is reproducible.

            Surely it will get worse<br>

            over time etc., and this way of using plooop is neither

            official nor supported/recommended,<br>

            but it's not the point here. The points are:<br>

             - this is a demonstration of what you could do with ploop<br>

             - this shows why you shouldn't trust any numbers<span

              class=""><br>

              <br>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

=======================================================================<br>

                <br>

                My experience with ZFS:<br>

                <br>

                real data used inside container near 62 GiB,<br>

                real space used on hard disk is near 11 GiB.<br>

              </blockquote>

              <br>

            </span>

            So, you are not even comparing apples to apples here. You

            just took two<br>

            different containers, certainly of different sizes, probably

            also different data sets<br>

            and usage history. Not saying it's invalid, but if you want

            to have a meaningful<br>

            (rather than anecdotal) comparison, you need to use same

            data sets, same<br>

            operations on data etc., try to optimize each case, and

            compare

            <div class="HOEnZb">

              <div class="h5"><br>

                <br>

                <br>

                <br>

                _______________________________________________<br>

                Users mailing list<br>

                <a moz-do-not-send="true" href="mailto:Users@openvz.org"

                  target="_blank">Users@openvz.org</a><br>

                <a moz-do-not-send="true"

                  href="https://lists.openvz.org/mailman/listinfo/users"

                  rel="noreferrer" target="_blank">https://lists.openvz.org/mailman/listinfo/users</a><br>

              </div>

            </div>

          </blockquote>

        </div>

        <br>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Users@openvz.org">Users@openvz.org</a>

<a class="moz-txt-link-freetext" href="https://lists.openvz.org/mailman/listinfo/users">https://lists.openvz.org/mailman/listinfo/users</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>