<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
On 07/23/2020 06:34 PM, CoolCold wrote:<br>
<blockquote cite="mid:CAGqmV7rmsMc-Dd+p_pJxSGXA7oiv2q0QHnX9Ghm6Te=xjmfqLw@mail.gmail.com" type="cite">
<div dir="ltr">
<div dir="ltr">
<div>Hello!</div>
<div><br>
</div>
<div>1st - great work guys! Dealign with LXC and even LXD
makes me miss my old good OpenVZ box because of tech
excellence! Keep going!<br>
</div>
<div>2nd - my 2 cents for content - I"m not a native speaker,
but still suggest some small fixes.</div>
</div>
</div>
</blockquote>
<br>
1. Thank you very much for the feedback!<br>
And you are very welcome back to use OpenVZ instead of LXC again. :)<br>
<br>
2. And many thanks for content corrections!<br>
i've just created a wiki page for tcache - decided this info should
be saved somewhere publicly available.<br>
i've also added a section how to enabled/disable tcache for
Containers.<br>
<br>
And you are very welcome to edit the wiki page as well. :)<br>
<br>
<a href="https://wiki.openvz.org/Tcache">https://wiki.openvz.org/Tcache</a><br>
<br>
<pre class="moz-signature" cols="179">--
Best regards,
Konstantin Khorenko,
Virtuozzo Linux Kernel Team
</pre>
<br>
<blockquote cite="mid:CAGqmV7rmsMc-Dd+p_pJxSGXA7oiv2q0QHnX9Ghm6Te=xjmfqLw@mail.gmail.com" type="cite">
<div dir="ltr">
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Thu, Jul 23, 2020 at 9:52
PM Konstantin Khorenko <<a moz-do-not-send="true" href="mailto:khorenko@virtuozzo.com">khorenko@virtuozzo.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">On 07/22/2020 03:04 PM,
Daniel Pearson wrote:<br>
<br>
>> b) you can disable tcache for this Container<br>
>> memcg::memory.disable_cleancache<br>
>> (raise your hand if you wish me to explain
what tcache is)<br>
><br>
> I'm all for additional information as it can help to
form proper<br>
> opinions if you don't mind providing it.<br>
<br>
Hope after reading it you'll catch yourself on an idea that
now you are aware of one more<br>
small feature which makes VZ is really cool and that there
are a lot of things which<br>
just work somewhere in the background simply (and silently)
making it possible for you<br>
to utilize the hardware at maximum. :)<br>
<br>
Tcache<br>
======<br>
<br>
Brief tech explanation:<br>
=======================<br>
Transcendent file cache (tcache) is a driver for cleancache<br>
<a moz-do-not-send="true" href="https://www.kernel.org/doc/html/v4.18/vm/cleancache.html" rel="noreferrer" target="_blank">https://www.kernel.org/doc/html/v4.18/vm/cleancache.html</a>
,<br>
which stores reclaimed pages in memory unmodified. Its
purpose it to<br>
adopt pages evicted from a memory cgroup on _local_ pressure
(inside a Container),<br>
so that they can be fetched back later without costly disk
accesses.<br>
<br>
Detailed explanation:<br>
=====================<br>
Tcache is intended increase the overall Hardware Node
performance only<br>
</blockquote>
<div>Intented "to"<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
on undercommitted Nodes, i.e. sum of all Containers memory
limits on the Node<br>
</blockquote>
<div>i.e. "where total sum of all Containers memory limit
values placed on the Node" <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
is less than Hardware Node RAM size.<br>
<br>
Imagine a situation: you have a Node with 1Tb of RAM,<br>
you run 500 Containers on it limited by 1Gb of memory each
(no swap for simplicity).<br>
Let's consider Container to be more or less identical,
similar load, similar activity inside.<br>
=> normally those Containers must use 500Gb of physical
RAM at max, right,<br>
and 500Gb will be just free on the Node.<br>
<br>
You think it's simple situation - ok, the node is
underloaded, let's put more Containers there,<br>
but that's not always true - it depends on what is the
bottleneck on the Node,<br>
which depends on real workload of Containers running on the
Node.<br>
But most often in real life - the disk becomes the
bottleneck first, not the RAM, not the CPU.<br>
<br>
Example: let's assume all those Containers run, say, cPanel,
which by default collect some stats<br>
every, say, 15 minutes - the stat collection process is run
via crontab.<br>
<br>
(Side note: randomizing times of crontab jobs - is a good
idea, but who usually does this<br>
for Containers? We did it for application templates we
shipped in Virtuozzo, but lot of<br>
software is just installed and configured inside Containers,
we cannot do this. And often<br>
Hosting Providers are not allowed to touch data in
Containers - so most often cron jobs are<br>
not randomized.)<br>
<br>
Ok, it does not matter how, but let's assume we get such a
workload - every, say, 15 minutes<br>
(it's important that data access it quite rare), each
Container accesses many small files,<br>
let it be just 100 small files to gather stats and save it
somewhere.<br>
In 500 Containers. Simultaneously.<br>
In parallel with other regular i/o workload.<br>
On HDDs.<br>
<br>
It's nightmare for disk subsystem, you know, if an HDD
provides 100 IOPS,<br>
it will take 50000/100/60 = 8.(3) minutes(!) to handle.<br>
OK, there could be RAID, let it is able to handle 300 IOPS,
it results in<br>
2.(7) minutes, and we forgot about other regular i/o,<br>
so it means every 15 minutes, the Node became almost
unresponsive for several minutes<br>
until it handles all that random i/o generated by stats
collection.<br>
<br>
You can ask - but why _every_ 15 minutes? You've read once a
file and it resides in the<br>
Container pagecache!<br>
That's true, but here comes _15 minutes_ period. The larger
period - the worse.<br>
If a Container is active enough, it just reads more and more
files - website data,<br>
pictures, video clips, files of a fileserver, don't know.<br>
The thing is in 15 minutes it's quite possible a Container
reads more than its RAM limit<br>
(remember - only 1Gb in our case!), and thus all old
pagecache is dropped, substituted<br>
with the fresh one.<br>
And thus in 15 minutes it's quite possible you'll have to
read all those 100 files in each<br>
Container from disk.<br>
<br>
And here comes tcache to save us: let's don't completely
drop pagecache which is<br>
reclaimed from a Container (on local(!) reclaim), but save
this pagecache in<br>
a special cache (tcache) on the Host in case there is free
RAM on the Host.<br>
<br>
And in 15 minutes when all Containers start to access lot of
small files again -<br>
those files data will be get back into Container pagecache
without reading from<br>
physical disk - viola, we saves IOPS, no Node stuck anymore.<br>
<br>
Q: can a Container be so active (i.e. read so much from
disk) that this "useful"<br>
pagecache is dropped even from tcache.<br>
</blockquote>
<div>missing question mark - ? <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
A: Yes. But tcache extends the "safe" period.<br>
<br>
Q: mainstream? LXC/Proxmox?<br>
A: No, it's Virtuozzo/OpenVZ specific.<br>
"cleancache" - the base for tcache it in mainstream,
it's used for Xen.<br>
But we (VZ) wrote a driver for it and use it for
Containers as well.<br>
<br>
Q: i use SSD, not HDD, does tcache help me?<br>
A: SSD can provide much more IOPS, thus the Node's
performance increase caused by tcache<br>
is less, but still reading from RAM (tcache is in RAM)
is faster than reading from SSD.<br>
</blockquote>
<div>is less "significant"<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<br>
<br>
>> c) you can limit the max amount of memory which can
be used for<br>
>> pagecache for this Container<br>
>> memcg::memory.cache.limit_in_bytes<br>
><br>
> This seems viable to test as well. Currently it seems
to be utilizing a<br>
> high number 'unlimited' default. I assume the only way
to set this is to<br>
> directly interact with the memory cgroup and not via a
standard ve<br>
> config value?<br>
<br>
Yes, you are right.<br>
We use this setting for some internal system cgroups running
processes<br>
which are known to generate a lot of pagecache which won't
be used later for sure.<br>
<br>
From my perspective it's not fair to apply such a setting
to a Container<br>
globally - well, CT owner pay for an amount of RAM, it
should be able to use<br>
this RAM for whatever he wants to - even for pagecache,<br>
so limiting the pagecache for a Container is not a tweak we
is advised to be used<br>
against a Container => no standard config parameter.<br>
<br>
Note: disabling tcache for a Container is completely fair,<br>
you disable just an optimization for the whole Hardware Node
performance,<br>
but all RAM configured for a Container - is still available
to the Container.<br>
(but also no official config value for that - most often it
helps, not hurts)<br>
<br>
<br>
> I assume regardless if we utilized vSwap or not, we
would likely still<br>
> experience these additional swapping issues, presumably
from pagecache<br>
> applications, or would the usage of vSwap intercept
some of these items<br>
> thus preventing them from being swapped to disk?<br>
<br>
vSwap - is the optimization for swapping process _local to a
Container_,<br>
it can prevent some Container anonymous pages to be written
to the physical swap,<br>
if _local_ Container reclaim decides to swapout something.<br>
<br>
At the moment you experience swapping on the Node level.<br>
Even if some Container's processes are put to the physical
swap,<br>
it's a decision of the global reclaim mechanism,<br>
so it's completely unrelated to vSwap =><br>
even if you assign some swappages to Containers and thus
enable vSwap for those Containers,<br>
i should not influence anyhow on global Node level memory
pressure and<br>
will not result in any difference in the swapping rate into
physical swap.<br>
<br>
Hope that helps.<br>
<br>
--<br>
Best regards,<br>
<br>
Konstantin Khorenko,<br>
Virtuozzo Linux Kernel Team<br>
_______________________________________________<br>
Users mailing list<br>
<a moz-do-not-send="true" href="mailto:Users@openvz.org" target="_blank">Users@openvz.org</a><br>
<a moz-do-not-send="true" href="https://lists.openvz.org/mailman/listinfo/users" rel="noreferrer" target="_blank">https://lists.openvz.org/mailman/listinfo/users</a><br>
</blockquote>
</div>
<br clear="all">
<br>
-- <br>
<div dir="ltr" class="gmail_signature">Best regards,<br>
[COOLCOLD-RIPN] </div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Users@openvz.org">Users@openvz.org</a>
<a class="moz-txt-link-freetext" href="https://lists.openvz.org/mailman/listinfo/users">https://lists.openvz.org/mailman/listinfo/users</a>
</pre>
</blockquote>
<br>
</body>
</html>