[Users] Some details of vSwap implementation in Virtuozzo 7

Wed Jul 22 15:04:21 MSK 2020

Howdy, Thanks for your reply.

On 7/22/20 5:59 AM, Konstantin Khorenko wrote:
> On 07/21/2020 06:55 PM, Daniel Pearson wrote:
>> Thanks for posting this, I have some doubts on your explanation along
>> with some information we received in tickets as well. For clarification
>> we primarily run web-server VM's.
>>
>> We do not use swappages, this is set 0:0 within the containers. However,
>> over a long period of time (100~ days) regardless of the kernel we use,
>> we get very odd processes that drop into the node level swap regardless
>> of free memory.
>
> Hi Daniel,
>
> OK, and this is completely unrelated to vSwap.
> You don't have vSwap configured on your nodes (don't use swappages for 
> Containers),
>
> (you can check if vSwap is used on a node at all:
>     # grep Tswap /proc/meminfo
>     Tswap:                 0 kB)

So that is correct, this is fully disabled.

>
> thus your question relates to a generic memory managing algorithm of a 
> Hardware Node.
>
> And yes, it's completely fine if mm decides to move some anonymous 
> memory into swap
> (physical!, no vSwap here), and use RAM for more caches instead.
>
>> So during these 100 days available memory has never gone to zero, nor
>> has it gone to the point where this many processes should swap. However,
>> we get 20GB worth of swap, the majority of which end up being very
>> active applications such as MySQL, httpd processes,
>
> And here comes your real question - why active applications might get 
> into the swap?
> (leaving apart the question if those applications are really _always_ 
> active,
> because if the are not active _for some time_, then no surprise)
>
> One of very common reason is some process - does not matter on the 
> host on in a Container,
> which generates _a lot of_ pagecache.
> Most often it's some kind of homebrew backup processes - just because 
> it's their job to read
> files from disk while performing a backup - and thus they generate a 
> lot of pagecache.
>
> Once again - it's possible such processes are run on the host (so 
> hoster performs backups)
> or in a Container (CT owner performs backups on his own).
>
> And mm does not know if this pagecache is "useful" and will be used by 
> someone a bit later or not,
> and by default considers it as useful. And if your disks are very 
> fast, new pagecache is generated
> so fast that even those application you consider as "active" are not 
> active enough from algorithm's
> point of view and thus pages from these processes might go into swap.
>
> What can be done with such "backup" processes?
>
> 1) if those processes are run on the host:
>    a) rewrite backups not to generate pagecache.
>       For example:
>       # dd if=/vz/private/123/root.hdd/root.hds iflag=direct of=- | gzip
>       (note, i'm not talking now about the consistency, that you must 
> backup snapshots, not live image, etc.
>        It's a different theme. Use commercial Virtuozzo backups at 
> last :). )
>
>    b) if you cannot rewrite backup software, run backup in a separate 
> memory cgroup
>       and limit pagecache for this cgroup.
>       This is a Virtuozzo specific memcg limit, mainstream does not 
> have it => LXC/Proxmox too.
>       We've implemented this feature especially to fight with such 
> processes which generate a lot of pagecache by their nature.

We've written our own backup process which does rely on snapshots so 
that avoids this issue but it is good to know about this additional 
memcg limit.

>
>       memcg::memory.cache.limit_in_bytes
>
> 2) if those processes are run by a Container owner:
>    a) you can limit the io/iops for the CT (thus pagecache is 
> generated slower and active processes
>       are considered as active enough not to get into swap)

This is interesting and something I will run down.

> b) you can disable tcache for this Container 
> memcg::memory.disable_cleancache
>       (raise your hand if you wish me to explain what tcache is)

I'm all for additional information as it can help to form proper 
opinions if you don't mind providing it.

> c) you can limit the max amount of memory which can be used for 
> pagecache for this Container
>       memcg::memory.cache.limit_in_bytes

This seems viable to test as well. Currently it seems to be utilizing a 
high number 'unlimited' default. I assume the only way to set this is to 
directly interact with the memory cgroup and not via a standard ve 
config value?

>
>
> These are my guesses surely, but this is the most often reason.
>
>> <skipped>
>>
>> "b) if there is free memory on the host, the Container's memory is saved
>> in a special swap cache in host's RAM and no real write to host's
>> physical swap occurs"
>>
>> But how can this be shown or proven? I do not believe this bit functions
>> at all on the nodes we run.
>
> Yes, i think vSwap does not work on your nodes if no swap is 
> configured for Containers.
> How to check the number of tswap pages at the moment on the Hardware 
> Node overall
> (tswap - is a backend for vSwap, so these are exactly pages which were 
> going to be put into physical swap from Containers,
> but were put into "tswap" - a special cache in Hardware Node RAM)
>
> # grep Tswap /proc/meminfo
>
> If you wish to check if vSwap works at all (i mean - if it avoids 
> physical swap io if there is free RAM on the node),
> * you can take an idle node for the experiment with, say 3X RAM,
> * create a Container with, say, X RAM + X vSwap,
> * run a memory eater process for 2X RAM in the Container,
> * check on the node the io in block device with swap (say, in 
> /proc/diskstats)

I assume regardless if we utilized vSwap or not, we would likely still 
experience these additional swapping issues, presumably from pagecache 
applications, or would the usage of vSwap intercept some of these items 
thus preventing them from being swapped to disk?

Thanks again for your reply and assistance. I'll start working on a few 
POC instances based on what you've given me so far.

>
> Hope that helps.
>
> -- 
> Best regards,
>
> Konstantin Khorenko,
> Virtuozzo Linux Kernel Team
> _______________________________________________
> Users mailing list
> Users at openvz.org
> https://lists.openvz.org/mailman/listinfo/users

-- 
Sincerely
Daniel C Pearson
COO KnownHost, LLC
https://www.knownhost.com