[Devel] [PATCH RHEL9 COMMIT] mm: Memory cgroup page cache limit
Konstantin Khorenko
khorenko at virtuozzo.com
Wed Feb 8 21:56:00 MSK 2023
The commit is pushed to "branch-rh9-5.14.0-162.6.1.vz9.18.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh9-5.14.0-162.6.1.vz9.18.7
------>
commit b8bc3dbf5e184d1cb16acb10eea25565db3580bf
Author: Andrey Ryabinin <ryabinin.a.a at gmail.com>
Date: Fri Jan 20 14:39:32 2023 +0200
mm: Memory cgroup page cache limit
The feature enhances memory cgroup to be able
to limit its page cache usage.
Feature exposes two memory cgroup files to set limit and to check usage:
memory::memory.cache.limit_in_bytes
memory::memory.cache.usage_in_bytes
Rationale: imagine a system service which anon memory you don't want to
limit.
In our case it's a vStorage cgroup which hosts CSes and MDSes:
* they can consume memory in some range
* we don't want to set a limit for max possible consumption - too high
* we don't know the number of CSes on the node - admin can add CSes
dynamically
* we don't want to dynamically increase/decrease the limit
If the cgroup is "unlimited" it produces permanent memory pressure on
the Node because it generates a lot of pagecache and other cgroups on
the Node are affected (even taking into account the fact of proportional
fair reclaim).
=> the solution is to limit pagecache only, so this is implemented.
============
Simple test:
dd if=/dev/random of=testfile.bin bs=1M count=1000
mkdir /sys/fs/cgroup/memory/pagecache_limiter
tee /sys/fs/cgroup/memory/pagecache_limiter/memory.cache.limit_in_bytes <<< $[2**24]
bash
echo $$ > /sys/fs/cgroup/memory/pagecache_limiter/tasks
cat /sys/fs/cgroup/memory/pagecache_limiter/memory.cache.usage_in_bytes
time wc -l testfile.bin
cat /sys/fs/cgroup/memory/pagecache_limiter/memory.cache.usage_in_bytes
echo 3 > /proc/sys/vm/drop_caches
cat /sys/fs/cgroup/memory/pagecache_limiter/memory.cache.usage_in_bytes
============
https://jira.sw.ru/browse/PSBM-77547 - initial problem
https://jira.sw.ru/browse/PSBM-78244 - feature jira ID
Original author of the feature:
Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>
Feature: mm: Memory cgroup page cache limit
===============================================================
The feature has been ported from vz7 to vz9:
vz7 commit: da9151c89181 ("mm/memcg: limit page cache in memcg hack")
Reworked:
now we have no charge/cancel/commit/uncharge memcg API (we only have
charge/uncharge)
=> we have to track pages which was charged as page cache
=> additional flag was introduced which implemented using mm/page_ext.c
subsystem (see mm/page_vzext.c)
See ms commits:
0d1c2072 ("mm: memcontrol: switch to native NR_FILE_PAGES and NR_SHMEM
counters")
3fea5a49 ("mm: memcontrol: convert page cache to a new
mem_cgroup_charge() API")
https://jira.sw.ru/browse/PSBM-131957
https://jira.sw.ru/browse/PSBM-134013
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn at virtuozzo.com>
===============================================================
Rebase feature from RHEL 9.0 to RHEL9.1.
The original vz7 implementation consists of following commits:
741beaa93c89 ("mm: introduce page vz extension (using page_ext)")
758d52e33a67 ("configs: Enable CONFIG_PAGE_EXTENSION")
d42d3c8b849d ("mm/memcg: limit page cache in memcg hack")
This port drops the page vz extensions in favor of using a memcg_data
bit to mark a page as cache. The benefit is that the implementation
and porting got more simple. If we require new flags then the newly
introduced folio can be used.
https://jira.sw.ru/browse/PSBM-144609
Signed-off-by: Alexander Atanasov <alexander.atanasov at virtuozzo.com>
===============================================================
===============================================================
Implementation decisions details:
+++
mm/memcg: reclaim memory.cache.limit_in_bytes from background
Reclaiming memory above memory.cache.limit_in_bytes always in direct
reclaim mode adds to much of a cost for vstorage. Instead of direct
reclaim allow to overflow memory.cache.limit_in_bytes but launch
the reclaim in background task.
https://pmc.acronis.com/browse/VSTOR-24395
https://jira.sw.ru/browse/PSBM-94761
Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>
(cherry picked from commit c7235680e58c0d7d792e8f47264ef233d2752b0b)
see ms 1a3e1f40 ("mm: memcontrol: decouple reference counting from page accounting")
https://jira.sw.ru/browse/PSBM-131957
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn at virtuozzo.com>
+++
mm/memcg: fix cache growth above cache.limit_in_bytes
Exceeding cache above cache.limit_in_bytes schedules high_work_func()
which tries to reclaim 32 pages. If cache generated fast enough or it allows
cgroup to steadily grow above cache.limit_in_bytes because we don't reclaim
enough. Try to reclaim exceeded amount of cache instead.
https://jira.sw.ru/browse/PSBM-106384
Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>
(cherry picked from commit 098f6a9add74a10848494427046cb8087ceb27d1)
https://jira.sw.ru/browse/PSBM-131957
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn at virtuozzo.com>
+++
mm/memcg: Use per-cpu stock charges for ->cache counter
Currently we use per-cpu stocks to do precharges of the ->memory and ->memsw
counters. Do this for the ->kmem and ->cache as well to decrease contention
on these counters as well.
https://jira.sw.ru/browse/PSBM-101300
Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>
(cherry picked from commit e1ae7b88d380d24a6df7c9b34635346726de39e3)
Original title:
mm/memcg: Use per-cpu stock charges for ->kmem and ->cache counters #PSBM-101300
Reworked:
kmem part was dropped because looks like this percpu charging functionallity
was covered by ms commit (see below).
see ms:
bf4f0599 ("mm: memcg/slab: obj_cgroup API")
e1a366be ("mm: memcontrol: switch to rcu protection in drain_all_stock()")
1a3e1f40 ("mm: memcontrol: decouple reference counting from page accounting")
https://jira.sw.ru/browse/PSBM-131957
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn at virtuozzo.com>
+++
mm: Fix nil dereference in __mem_cgroup_charge_gen()
When we're running kdump kernel it starts up with cgroup_disable=memory,
ie without memory cgroup. In result __mem_cgroup_charge_gen tries to
dereference nil pointer. Add an appropriate guard here.
__mem_cgroup_charge_gen() has been introduced in Virtuozzo kernel by
modifying __mem_cgroup_charge() which (in RHEL code) also does not
contain the check for memcg availability. But the check absence does not
lead us to problems because __mem_cgroup_charge() is always called
through the wrapper mem_cgroup_charge() which, in its turn, contains the
check for memcg availability.
So let's move the check from upper mem_cgroup_charge() to lower
__mem_cgroup_charge_gen().
https://jira.sw.ru/browse/PSBM-139098
Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
---
include/linux/memcontrol.h | 29 +++++-
mm/filemap.c | 3 +-
mm/memcontrol.c | 220 ++++++++++++++++++++++++++++++++++++---------
3 files changed, 208 insertions(+), 44 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 561db06f1fd8..6ba36eb051b8 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -273,6 +273,7 @@ struct mem_cgroup {
/* Legacy consumer-oriented counters */
struct page_counter kmem; /* v1 only */
struct page_counter tcpmem; /* v1 only */
+ struct page_counter cache;
/* Range enforcement for interrupt charges */
struct work_struct high_work;
@@ -405,8 +406,10 @@ enum page_memcg_data_flags {
MEMCG_DATA_OBJCGS = (1UL << 0),
/* page has been accounted as a non-slab kernel page */
MEMCG_DATA_KMEM = (1UL << 1),
+ /* page has been accounted as a cache page */
+ MEMCG_DATA_PGCACHE = (1UL << 2),
/* the next bit after the last actual flag */
- __NR_MEMCG_DATA_FLAGS = (1UL << 2),
+ __NR_MEMCG_DATA_FLAGS = (1UL << 3),
};
#define MEMCG_DATA_FLAGS_MASK (__NR_MEMCG_DATA_FLAGS - 1)
@@ -771,11 +774,25 @@ int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp);
static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm,
gfp_t gfp)
{
- if (mem_cgroup_disabled())
- return 0;
return __mem_cgroup_charge(folio, mm, gfp);
}
+int mem_cgroup_charge_cache(struct folio *folio, struct mm_struct *mm,
+ gfp_t gfp);
+
+/*
+ * folio_memcg_cache - Check if the folio has the pgcache flag set.
+ * @folio: Pointer to the folio.
+ *
+ * Checks if the folio has page cache flag set. The caller must ensure
+ * that the folio has an associated memory cgroup. It's not safe to call
+ * this function against some types of folios, e.g. slab folios.
+ */
+static inline bool folio_memcg_cache(struct folio *folio)
+{
+ return folio->memcg_data & MEMCG_DATA_PGCACHE;
+}
+
int mem_cgroup_swapin_charge_page(struct page *page, struct mm_struct *mm,
gfp_t gfp, swp_entry_t entry);
void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry);
@@ -1339,6 +1356,12 @@ static inline int mem_cgroup_charge(struct folio *folio,
return 0;
}
+static inline int mem_cgroup_charge_cache(struct folio *folio,
+ struct mm_struct *mm, gfp_t gfp)
+{
+ return 0;
+}
+
static inline int mem_cgroup_swapin_charge_page(struct page *page,
struct mm_struct *mm, gfp_t gfp, swp_entry_t entry)
{
diff --git a/mm/filemap.c b/mm/filemap.c
index 2d63e53980e4..d568ffc0d416 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -841,7 +841,8 @@ noinline int __filemap_add_folio(struct address_space *mapping,
mapping_set_update(&xas, mapping);
if (!huge) {
- int error = mem_cgroup_charge(folio, NULL, gfp);
+ int error = mem_cgroup_charge_cache(folio, NULL, gfp);
+
VM_BUG_ON_FOLIO(index & (folio_nr_pages(folio) - 1), folio);
if (error)
return error;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6fa13539f3e5..2cfa29bff963 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -218,6 +218,7 @@ enum res_type {
_OOM_TYPE,
_KMEM,
_TCP,
+ _CACHE,
};
#define MEMFILE_PRIVATE(x, val) ((x) << 16 | (val))
@@ -2207,6 +2208,7 @@ struct memcg_stock_pcp {
int nr_slab_unreclaimable_b;
#endif
+ unsigned int cache_nr_pages;
struct work_struct work;
unsigned long flags;
#define FLUSHING_CACHED_CHARGE 0
@@ -2248,7 +2250,8 @@ static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages)
*
* returns true if successful, false otherwise.
*/
-static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
+static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
+ bool cache)
{
struct memcg_stock_pcp *stock;
unsigned long flags;
@@ -2260,9 +2263,16 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
local_lock_irqsave(&memcg_stock.stock_lock, flags);
stock = this_cpu_ptr(&memcg_stock);
- if (memcg == stock->cached && stock->nr_pages >= nr_pages) {
- stock->nr_pages -= nr_pages;
- ret = true;
+ if (memcg == stock->cached) {
+ if (cache && stock->cache_nr_pages >= nr_pages) {
+ stock->cache_nr_pages -= nr_pages;
+ ret = true;
+ }
+
+ if (!cache && stock->nr_pages >= nr_pages) {
+ stock->nr_pages -= nr_pages;
+ ret = true;
+ }
}
local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
@@ -2276,15 +2286,20 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
static void drain_stock(struct memcg_stock_pcp *stock)
{
struct mem_cgroup *old = stock->cached;
+ unsigned long nr_pages = stock->nr_pages + stock->cache_nr_pages;
if (!old)
return;
- if (stock->nr_pages) {
- page_counter_uncharge(&old->memory, stock->nr_pages);
+ if (stock->cache_nr_pages)
+ page_counter_uncharge(&old->cache, stock->cache_nr_pages);
+
+ if (nr_pages) {
+ page_counter_uncharge(&old->memory, nr_pages);
if (do_memsw_account())
- page_counter_uncharge(&old->memsw, stock->nr_pages);
+ page_counter_uncharge(&old->memsw, nr_pages);
stock->nr_pages = 0;
+ stock->cache_nr_pages = 0;
}
css_put(&old->css);
@@ -2318,9 +2333,11 @@ static void drain_local_stock(struct work_struct *dummy)
* Cache charges(val) to local per_cpu area.
* This will be consumed by consume_stock() function, later.
*/
-static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
+static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
+ bool cache)
{
struct memcg_stock_pcp *stock;
+ unsigned long stock_nr_pages;
stock = this_cpu_ptr(&memcg_stock);
if (stock->cached != memcg) { /* reset if necessary */
@@ -2328,18 +2345,23 @@ static void __refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
css_get(&memcg->css);
stock->cached = memcg;
}
- stock->nr_pages += nr_pages;
+ if (!cache)
+ stock->nr_pages += nr_pages;
+ else
+ stock->cache_nr_pages += nr_pages;
- if (stock->nr_pages > MEMCG_CHARGE_BATCH)
+ stock_nr_pages = stock->nr_pages + stock->cache_nr_pages;
+ if (stock_nr_pages > MEMCG_CHARGE_BATCH)
drain_stock(stock);
}
-static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
+static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
+ bool cache)
{
unsigned long flags;
local_lock_irqsave(&memcg_stock.stock_lock, flags);
- __refill_stock(memcg, nr_pages);
+ __refill_stock(memcg, nr_pages, cache);
local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
}
@@ -2366,10 +2388,12 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
struct mem_cgroup *memcg;
bool flush = false;
+ unsigned long nr_pages = stock->nr_pages +
+ stock->cache_nr_pages;
rcu_read_lock();
memcg = stock->cached;
- if (memcg && stock->nr_pages &&
+ if (memcg && nr_pages &&
mem_cgroup_is_descendant(memcg, root_memcg))
flush = true;
else if (obj_stock_flush_required(stock, root_memcg))
@@ -2406,17 +2430,27 @@ static unsigned long reclaim_high(struct mem_cgroup *memcg,
do {
unsigned long pflags;
+ long cache_overused;
- if (page_counter_read(&memcg->memory) <=
- READ_ONCE(memcg->memory.high))
- continue;
+ if (page_counter_read(&memcg->memory) >
+ READ_ONCE(memcg->memory.high)) {
+ memcg_memory_event(memcg, MEMCG_HIGH);
+
+ psi_memstall_enter(&pflags);
+ nr_reclaimed += try_to_free_mem_cgroup_pages(memcg,
+ nr_pages, gfp_mask, true);
+ psi_memstall_leave(&pflags);
+ }
- memcg_memory_event(memcg, MEMCG_HIGH);
+ cache_overused = page_counter_read(&memcg->cache) -
+ memcg->cache.max;
- psi_memstall_enter(&pflags);
- nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, nr_pages,
- gfp_mask, true);
- psi_memstall_leave(&pflags);
+ if (cache_overused > 0) {
+ psi_memstall_enter(&pflags);
+ nr_reclaimed += try_to_free_mem_cgroup_pages(memcg,
+ cache_overused, gfp_mask, false);
+ psi_memstall_leave(&pflags);
+ }
} while ((memcg = parent_mem_cgroup(memcg)) &&
!mem_cgroup_is_root(memcg));
@@ -2652,7 +2686,7 @@ void mem_cgroup_handle_over_high(void)
}
static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
- unsigned int nr_pages)
+ unsigned int nr_pages, bool cache_charge)
{
unsigned int batch = max(MEMCG_CHARGE_BATCH, nr_pages);
int nr_retries = MAX_RECLAIM_RETRIES;
@@ -2666,8 +2700,8 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
unsigned long pflags;
retry:
- if (consume_stock(memcg, nr_pages))
- return 0;
+ if (consume_stock(memcg, nr_pages, cache_charge))
+ goto done;
if (!do_memsw_account() ||
page_counter_try_charge(&memcg->memsw, batch, &counter)) {
@@ -2780,13 +2814,19 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
page_counter_charge(&memcg->memory, nr_pages);
if (do_memsw_account())
page_counter_charge(&memcg->memsw, nr_pages);
+ if (cache_charge)
+ page_counter_charge(&memcg->cache, nr_pages);
return 0;
done_restock:
+ if (cache_charge)
+ page_counter_charge(&memcg->cache, batch);
+
if (batch > nr_pages)
- refill_stock(memcg, batch - nr_pages);
+ refill_stock(memcg, batch - nr_pages, cache_charge);
+done:
/*
* If the hierarchy is above the normal consumption range, schedule
* reclaim on returning to userland. We can perform reclaim here
@@ -2826,6 +2866,9 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
current->memcg_nr_pages_over_high += batch;
set_notify_resume(current);
break;
+ } else if (page_counter_read(&memcg->cache) > memcg->cache.max) {
+ if (!work_pending(&memcg->high_work))
+ schedule_work(&memcg->high_work);
}
} while ((memcg = parent_mem_cgroup(memcg)));
@@ -2833,12 +2876,12 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
}
static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
- unsigned int nr_pages)
+ unsigned int nr_pages, bool cache_charge)
{
if (mem_cgroup_is_root(memcg))
return 0;
- return try_charge_memcg(memcg, gfp_mask, nr_pages);
+ return try_charge_memcg(memcg, gfp_mask, nr_pages, cache_charge);
}
static inline void cancel_charge(struct mem_cgroup *memcg, unsigned int nr_pages)
@@ -3024,7 +3067,7 @@ static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
memcg = get_mem_cgroup_from_objcg(objcg);
memcg_account_kmem(memcg, -nr_pages);
- refill_stock(memcg, nr_pages);
+ refill_stock(memcg, nr_pages, false);
css_put(&memcg->css);
}
@@ -3045,7 +3088,7 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
memcg = get_mem_cgroup_from_objcg(objcg);
- ret = try_charge_memcg(memcg, gfp, nr_pages);
+ ret = try_charge_memcg(memcg, gfp, nr_pages, false);
if (ret)
goto out;
@@ -3204,7 +3247,7 @@ static struct obj_cgroup *drain_obj_stock(struct memcg_stock_pcp *stock)
memcg = get_mem_cgroup_from_objcg(old);
memcg_account_kmem(memcg, -nr_pages);
- __refill_stock(memcg, nr_pages);
+ __refill_stock(memcg, nr_pages, false);
css_put(&memcg->css);
}
@@ -3352,7 +3395,7 @@ int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp,
{
int ret = 0;
- ret = try_charge(memcg, gfp, nr_pages);
+ ret = try_charge(memcg, gfp, nr_pages, false);
if (!ret)
page_counter_charge(&memcg->kmem, nr_pages);
@@ -3711,6 +3754,9 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css,
case _TCP:
counter = &memcg->tcpmem;
break;
+ case _CACHE:
+ counter = &memcg->cache;
+ break;
default:
BUG();
}
@@ -3829,6 +3875,44 @@ static int memcg_update_tcp_max(struct mem_cgroup *memcg, unsigned long max)
return ret;
}
+static int memcg_update_cache_max(struct mem_cgroup *memcg,
+ unsigned long limit)
+{
+ unsigned long nr_pages;
+ bool enlarge = false;
+ int ret;
+
+ do {
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ break;
+ }
+ mutex_lock(&memcg_max_mutex);
+
+ if (limit > memcg->cache.max)
+ enlarge = true;
+
+ ret = page_counter_set_max(&memcg->cache, limit);
+ mutex_unlock(&memcg_max_mutex);
+
+ if (!ret)
+ break;
+
+ nr_pages = max_t(long, 1,
+ page_counter_read(&memcg->cache) - limit);
+ if (!try_to_free_mem_cgroup_pages(memcg, nr_pages,
+ GFP_KERNEL, false)) {
+ ret = -EBUSY;
+ break;
+ }
+ } while (1);
+
+ if (!ret && enlarge)
+ memcg_oom_recover(memcg);
+
+ return ret;
+}
+
/*
* The user of this function is...
* RES_LIMIT.
@@ -3865,6 +3949,9 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of,
case _TCP:
ret = memcg_update_tcp_max(memcg, nr_pages);
break;
+ case _CACHE:
+ ret = memcg_update_cache_max(memcg, nr_pages);
+ break;
}
break;
case RES_SOFT_LIMIT:
@@ -3898,6 +3985,9 @@ static ssize_t mem_cgroup_reset(struct kernfs_open_file *of, char *buf,
case _TCP:
counter = &memcg->tcpmem;
break;
+ case _CACHE:
+ counter = &memcg->cache;
+ break;
default:
BUG();
}
@@ -5541,6 +5631,17 @@ static struct cftype mem_cgroup_legacy_files[] = {
{
.name = "pressure_level",
},
+ {
+ .name = "cache.limit_in_bytes",
+ .private = MEMFILE_PRIVATE(_CACHE, RES_LIMIT),
+ .write = mem_cgroup_write,
+ .read_u64 = mem_cgroup_read_u64,
+ },
+ {
+ .name = "cache.usage_in_bytes",
+ .private = MEMFILE_PRIVATE(_CACHE, RES_USAGE),
+ .read_u64 = mem_cgroup_read_u64,
+ },
#ifdef CONFIG_NUMA
{
.name = "numa_stat",
@@ -5825,11 +5926,13 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
page_counter_init(&memcg->swap, &parent->swap);
page_counter_init(&memcg->kmem, &parent->kmem);
page_counter_init(&memcg->tcpmem, &parent->tcpmem);
+ page_counter_init(&memcg->cache, &parent->cache);
} else {
page_counter_init(&memcg->memory, NULL);
page_counter_init(&memcg->swap, NULL);
page_counter_init(&memcg->kmem, NULL);
page_counter_init(&memcg->tcpmem, NULL);
+ page_counter_init(&memcg->cache, NULL);
root_mem_cgroup = memcg;
return &memcg->css;
@@ -5950,6 +6053,7 @@ static void mem_cgroup_css_reset(struct cgroup_subsys_state *css)
page_counter_set_max(&memcg->swap, PAGE_COUNTER_MAX);
page_counter_set_max(&memcg->kmem, PAGE_COUNTER_MAX);
page_counter_set_max(&memcg->tcpmem, PAGE_COUNTER_MAX);
+ page_counter_set_max(&memcg->cache, PAGE_COUNTER_MAX);
page_counter_set_min(&memcg->memory, 0);
page_counter_set_low(&memcg->memory, 0);
page_counter_set_high(&memcg->memory, PAGE_COUNTER_MAX);
@@ -6051,7 +6155,8 @@ static int mem_cgroup_do_precharge(unsigned long count)
int ret;
/* Try a single bulk charge without reclaim first, kswapd may wake */
- ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count);
+ ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count,
+ false);
if (!ret) {
mc.precharge += count;
return ret;
@@ -6059,7 +6164,7 @@ static int mem_cgroup_do_precharge(unsigned long count)
/* Try charges one by one with reclaim, but do not retry */
while (count--) {
- ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1);
+ ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1, false);
if (ret)
return ret;
mc.precharge++;
@@ -7285,18 +7390,27 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root,
}
static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg,
- gfp_t gfp)
+ gfp_t gfp, bool cache_charge)
{
long nr_pages = folio_nr_pages(folio);
int ret;
- ret = try_charge(memcg, gfp, nr_pages);
+ ret = try_charge(memcg, gfp, nr_pages, cache_charge);
if (ret)
goto out;
css_get(&memcg->css);
commit_charge(folio, memcg);
+ /*
+ * We always cleanup this flag on uncharging, it means
+ * that during charging we shouldn't have this flag set
+ */
+
+ VM_BUG_ON(folio_memcg_cache(folio));
+ if (cache_charge)
+ WRITE_ONCE(folio->memcg_data,
+ READ_ONCE(folio->memcg_data) | MEMCG_DATA_PGCACHE);
local_irq_disable();
mem_cgroup_charge_statistics(memcg, nr_pages);
memcg_check_events(memcg, folio_nid(folio));
@@ -7305,18 +7419,32 @@ static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg,
return ret;
}
-int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp)
+static int __mem_cgroup_charge_gen(struct folio *folio, struct mm_struct *mm,
+ gfp_t gfp_mask, bool cache_charge)
{
struct mem_cgroup *memcg;
int ret;
+ if (mem_cgroup_disabled())
+ return 0;
+
memcg = get_mem_cgroup_from_mm(mm);
- ret = charge_memcg(folio, memcg, gfp);
+ ret = charge_memcg(folio, memcg, gfp_mask, cache_charge);
css_put(&memcg->css);
return ret;
}
+int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp)
+{
+ return __mem_cgroup_charge_gen(folio, mm, gfp, false);
+}
+
+int mem_cgroup_charge_cache(struct folio *folio, struct mm_struct *mm, gfp_t gfp)
+{
+ return __mem_cgroup_charge_gen(folio, mm, gfp, true);
+}
+
/**
* mem_cgroup_swapin_charge_page - charge a newly allocated page for swapin
* @page: page to charge
@@ -7347,7 +7475,7 @@ int mem_cgroup_swapin_charge_page(struct page *page, struct mm_struct *mm,
memcg = get_mem_cgroup_from_mm(mm);
rcu_read_unlock();
- ret = charge_memcg(folio, memcg, gfp);
+ ret = charge_memcg(folio, memcg, gfp, false);
css_put(&memcg->css);
return ret;
@@ -7391,6 +7519,7 @@ struct uncharge_gather {
unsigned long nr_memory;
unsigned long pgpgout;
unsigned long nr_kmem;
+ unsigned long nr_pgcache;
int nid;
};
@@ -7409,6 +7538,9 @@ static void uncharge_batch(const struct uncharge_gather *ug)
page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory);
if (ug->nr_kmem)
memcg_account_kmem(ug->memcg, -ug->nr_kmem);
+ if (ug->nr_pgcache)
+ page_counter_uncharge(&ug->memcg->cache, ug->nr_pgcache);
+
memcg_oom_recover(ug->memcg);
}
@@ -7470,6 +7602,8 @@ static void uncharge_folio(struct folio *folio, struct uncharge_gather *ug)
folio->memcg_data = 0;
obj_cgroup_put(objcg);
} else {
+ if (folio_memcg_cache(folio))
+ ug->nr_pgcache += nr_pages;
/* LRU pages aren't accounted at the root level */
if (!mem_cgroup_is_root(memcg))
ug->nr_memory += nr_pages;
@@ -7553,6 +7687,12 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new)
page_counter_charge(&memcg->memsw, nr_pages);
}
+ WARN_ON((!PageAnon(&new->page) && !PageSwapBacked(&new->page)) |
+ folio_memcg_cache(new));
+
+ if (folio_memcg_cache(new))
+ page_counter_charge(&memcg->cache, nr_pages);
+
css_get(&memcg->css);
commit_charge(new, memcg);
@@ -7621,7 +7761,7 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages,
return false;
}
- if (try_charge(memcg, gfp_mask, nr_pages) == 0) {
+ if (try_charge(memcg, gfp_mask, nr_pages, false) == 0) {
mod_memcg_state(memcg, MEMCG_SOCK, nr_pages);
return true;
}
@@ -7643,7 +7783,7 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
mod_memcg_state(memcg, MEMCG_SOCK, -nr_pages);
- refill_stock(memcg, nr_pages);
+ refill_stock(memcg, nr_pages, false);
}
static int __init cgroup_memory(char *s)
More information about the Devel
mailing list