<html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    On 07/23/2020 06:34 PM, CoolCold wrote:<br>

    <blockquote cite="mid:CAGqmV7rmsMc-Dd+p_pJxSGXA7oiv2q0QHnX9Ghm6Te=xjmfqLw@mail.gmail.com" type="cite">

      <div dir="ltr">

        <div dir="ltr">

          <div>Hello!</div>

          <div><br>

          </div>

          <div>1st - great work guys! Dealign with LXC and even LXD

            makes me miss my old good OpenVZ box because of tech

            excellence! Keep going!<br>

          </div>

          <div>2nd - my 2 cents for content - I&quot;m not a native speaker,

            but still suggest some small fixes.</div>

        </div>

      </div>

    </blockquote>

    <br>

    1. Thank you very much for the feedback!<br>

    And you are very welcome back to use OpenVZ instead of LXC again. :)<br>

    <br>

    2. And many thanks for content corrections!<br>

    i've just created a wiki page for tcache - decided this info should

    be saved somewhere publicly available.<br>

    i've also added a section how to enabled/disable tcache for

    Containers.<br>

    <br>

    And you are very welcome to edit the wiki page as well. :)<br>

    <br>

    <a href="https://wiki.openvz.org/Tcache">https://wiki.openvz.org/Tcache</a><br>

    <br>

    <pre class="moz-signature" cols="179">--

Best regards,

Konstantin Khorenko,

Virtuozzo Linux Kernel Team

</pre>

    <br>

    <blockquote cite="mid:CAGqmV7rmsMc-Dd+p_pJxSGXA7oiv2q0QHnX9Ghm6Te=xjmfqLw@mail.gmail.com" type="cite">

      <div dir="ltr">

        <div class="gmail_quote">

          <div dir="ltr" class="gmail_attr">On Thu, Jul 23, 2020 at 9:52

            PM Konstantin Khorenko &lt;<a moz-do-not-send="true" href="mailto:khorenko@virtuozzo.com">khorenko@virtuozzo.com</a>&gt;

            wrote:<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">On 07/22/2020 03:04 PM,

            Daniel Pearson wrote:<br>

            <br>

            &gt;&gt; b) you can disable tcache for this Container<br>

            &gt;&gt; memcg::memory.disable_cleancache<br>

            &gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp;(raise your hand if you wish me to explain

            what tcache is)<br>

            &gt;<br>

            &gt; I'm all for additional information as it can help to

            form proper<br>

            &gt; opinions if you don't mind providing it.<br>

            <br>

            Hope after reading it you'll catch yourself on an idea that

            now you are aware of one more<br>

            small feature which makes VZ is really cool and that there

            are a lot of things which<br>

            just work somewhere in the background simply (and silently)

            making it possible for you<br>

            to utilize the hardware at maximum. :)<br>

            <br>

            Tcache<br>

            ======<br>

            <br>

            Brief tech explanation:<br>

            =======================<br>

            Transcendent file cache (tcache) is a driver for cleancache<br>

            <a moz-do-not-send="true" href="https://www.kernel.org/doc/html/v4.18/vm/cleancache.html" rel="noreferrer" target="_blank">https://www.kernel.org/doc/html/v4.18/vm/cleancache.html</a>

            ,<br>

            which stores reclaimed pages in memory unmodified. Its

            purpose it to<br>

            adopt pages evicted from a memory cgroup on _local_ pressure

            (inside a Container),<br>

            so that they can be fetched back later without costly disk

            accesses.<br>

            <br>

            Detailed explanation:<br>

            =====================<br>

            Tcache is intended increase the overall Hardware Node

            performance only<br>

          </blockquote>

          <div>Intented &quot;to&quot;<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            on undercommitted Nodes, i.e. sum of all Containers memory

            limits on the Node<br>

          </blockquote>

          <div>i.e. &quot;where total sum of all Containers memory limit

            values placed on the Node&quot; <br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            is less than Hardware Node RAM size.<br>

            <br>

            Imagine a situation: you have a Node with 1Tb of RAM,<br>

            you run 500 Containers on it limited by 1Gb of memory each

            (no swap for simplicity).<br>

            Let's consider Container to be more or less identical,

            similar load, similar activity inside.<br>

            =&gt; normally those Containers must use 500Gb of physical

            RAM at max, right,<br>

            and 500Gb will be just free on the Node.<br>

            <br>

            You think it's simple situation - ok, the node is

            underloaded, let's put more Containers there,<br>

            but that's not always true - it depends on what is the

            bottleneck on the Node,<br>

            which depends on real workload of Containers running on the

            Node.<br>

            But most often in real life - the disk becomes the

            bottleneck first, not the RAM, not the CPU.<br>

            <br>

            Example: let's assume all those Containers run, say, cPanel,

            which by default collect some stats<br>

            every, say, 15 minutes - the stat collection process is run

            via crontab.<br>

            <br>

            (Side note: randomizing times of crontab jobs - is a good

            idea, but who usually does this<br>

            for Containers? We did it for application templates we

            shipped in Virtuozzo, but lot of<br>

            software is just installed and configured inside Containers,

            we cannot do this. And often<br>

            Hosting Providers are not allowed to touch data in

            Containers - so most often cron jobs are<br>

            not randomized.)<br>

            <br>

            Ok, it does not matter how, but let's assume we get such a

            workload - every, say, 15 minutes<br>

            (it's important that data access it quite rare), each

            Container accesses many small files,<br>

            let it be just 100 small files to gather stats and save it

            somewhere.<br>

            In 500 Containers. Simultaneously.<br>

            In parallel with other regular i/o workload.<br>

            On HDDs.<br>

            <br>

            It's nightmare for disk subsystem, you know,&nbsp; if an HDD

            provides 100 IOPS,<br>

            it will take 50000/100/60 = 8.(3) minutes(!) to handle.<br>

            OK, there could be RAID, let it is able to handle 300 IOPS,

            it results in<br>

            2.(7) minutes, and we forgot about other regular i/o,<br>

            so it means every 15 minutes, the Node became almost

            unresponsive for several minutes<br>

            until it handles all that random i/o generated by stats

            collection.<br>

            <br>

            You can ask - but why _every_ 15 minutes? You've read once a

            file and it resides in the<br>

            Container pagecache!<br>

            That's true, but here comes _15 minutes_ period. The larger

            period - the worse.<br>

            If a Container is active enough, it just reads more and more

            files - website data,<br>

            pictures, video clips, files of a fileserver, don't know.<br>

            The thing is in 15 minutes it's quite possible a Container

            reads more than its RAM limit<br>

            (remember - only 1Gb in our case!), and thus all old

            pagecache is dropped, substituted<br>

            with the fresh one.<br>

            And thus in 15 minutes it's quite possible you'll have to

            read all those 100 files in each<br>

            Container from disk.<br>

            <br>

            And here comes tcache to save us: let's don't completely

            drop pagecache which is<br>

            reclaimed from a Container (on local(!) reclaim), but save

            this pagecache in<br>

            a special cache (tcache) on the Host in case there is free

            RAM on the Host.<br>

            <br>

            And in 15 minutes when all Containers start to access lot of

            small files again -<br>

            those files data will be get back into Container pagecache

            without reading from<br>

            physical disk - viola, we saves IOPS, no Node stuck anymore.<br>

            <br>

            Q: can a Container be so active (i.e. read so much from

            disk) that this &quot;useful&quot;<br>

            pagecache is dropped even from tcache.<br>

          </blockquote>

          <div>missing question mark - ? <br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            A: Yes. But tcache extends the &quot;safe&quot; period.<br>

            <br>

            Q: mainstream? LXC/Proxmox?<br>

            A: No, it's Virtuozzo/OpenVZ specific.<br>

            &nbsp; &nbsp; &quot;cleancache&quot; - the base for tcache it in mainstream,

            it's used for Xen.<br>

            &nbsp; &nbsp; But we (VZ) wrote a driver for it and use it for

            Containers as well.<br>

            <br>

            Q: i use SSD, not HDD, does tcache help me?<br>

            A: SSD can provide much more IOPS, thus the Node's

            performance increase caused by tcache<br>

            &nbsp; &nbsp; is less, but still reading from RAM (tcache is in RAM)

            is faster than reading from SSD.<br>

          </blockquote>

          <div>is less &quot;significant&quot;<br>

          </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px

            0.8ex;border-left:1px solid

            rgb(204,204,204);padding-left:1ex">

            <br>

            <br>

            &gt;&gt; c) you can limit the max amount of memory which can

            be used for<br>

            &gt;&gt; pagecache for this Container<br>

            &gt;&gt;&nbsp; &nbsp; &nbsp; &nbsp;memcg::memory.cache.limit_in_bytes<br>

            &gt;<br>

            &gt; This seems viable to test as well. Currently it seems

            to be utilizing a<br>

            &gt; high number 'unlimited' default. I assume the only way

            to set this is to<br>

            &gt; directly interact with the memory cgroup and not via a

            standard ve<br>

            &gt; config value?<br>

            <br>

            Yes, you are right.<br>

            We use this setting for some internal system cgroups running

            processes<br>

            which are known to generate a lot of pagecache which won't

            be used later for sure.<br>

            <br>

            &nbsp;From my perspective it's not fair to apply such a setting

            to a Container<br>

            globally - well, CT owner pay for an amount of RAM, it

            should be able to use<br>

            this RAM for whatever he wants to - even for pagecache,<br>

            so limiting the pagecache for a Container is not a tweak we

            is advised to be used<br>

            against a Container =&gt; no standard config parameter.<br>

            <br>

            Note: disabling tcache for a Container is completely fair,<br>

            you disable just an optimization for the whole Hardware Node

            performance,<br>

            but all RAM configured for a Container - is still available

            to the Container.<br>

            (but also no official config value for that - most often it

            helps, not hurts)<br>

            <br>

            <br>

            &gt; I assume regardless if we utilized vSwap or not, we

            would likely still<br>

            &gt; experience these additional swapping issues, presumably

            from pagecache<br>

            &gt; applications, or would the usage of vSwap intercept

            some of these items<br>

            &gt; thus preventing them from being swapped to disk?<br>

            <br>

            vSwap - is the optimization for swapping process _local to a

            Container_,<br>

            it can prevent some Container anonymous pages to be written

            to the physical swap,<br>

            if _local_ Container reclaim decides to swapout something.<br>

            <br>

            At the moment you experience swapping on the Node level.<br>

            Even if some Container's processes are put to the physical

            swap,<br>

            it's a decision of the global reclaim mechanism,<br>

            so it's completely unrelated to vSwap =&gt;<br>

            even if you assign some swappages to Containers and thus

            enable vSwap for those Containers,<br>

            i should not influence anyhow on global Node level memory

            pressure and<br>

            will not result in any difference in the swapping rate into

            physical swap.<br>

            <br>

            Hope that helps.<br>

            <br>

            --<br>

            Best regards,<br>

            <br>

            Konstantin Khorenko,<br>

            Virtuozzo Linux Kernel Team<br>

            _______________________________________________<br>

            Users mailing list<br>

            <a moz-do-not-send="true" href="mailto:Users@openvz.org" target="_blank">Users@openvz.org</a><br>

            <a moz-do-not-send="true" href="https://lists.openvz.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.openvz.org/mailman/listinfo/users</a><br>

          </blockquote>

        </div>

        <br clear="all">

        <br>

        -- <br>

        <div dir="ltr" class="gmail_signature">Best regards,<br>

          [COOLCOLD-RIPN] </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Users@openvz.org">Users@openvz.org</a>

<a class="moz-txt-link-freetext" href="https://lists.openvz.org/mailman/listinfo/users">https://lists.openvz.org/mailman/listinfo/users</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>