[Devel] [PATCH] mm/memcg: limit page cache in memcg hack
Alexander Mikhalitsyn
alexander.mikhalitsyn at virtuozzo.com
Mon Oct 18 11:14:21 MSK 2021
From: Andrey Ryabinin <aryabinin at virtuozzo.com>
Add new memcg file - memory.cache.limit_in_bytes.
Used to limit page cache usage in cgroup.
https://jira.sw.ru/browse/PSBM-77547
Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>
khorenko@: usecase:
imagine a system service which anon memory you don't want to limit
(in our case it's a vStorage cgroup which hosts CSes and MDSes, they can
consume memory in some range and we don't want to set a limit for max possible
consumption - too high, and we don't know the number of CSes on the node -
admin can add CSes dynamically. And we don't want to dynamically
increase/decrease the limit).
If the cgroup is "unlimited" it produces permanent memory pressure on the node
because it generates a lot of pagecache and other cgroups on the node are
affected (even taking into account the fact of proportional fair reclaim).
=> solution is to limit pagecache only, so this is implemented.
Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>
(cherry picked from commit da9151c891819733762a178b4efd7e44766fb8b1)
Reworked:
now we have no charge/cancel/commit/uncharge memcg API (we only have charge/uncharge)
=> we have to track pages which was charged as page cache => additional flag was introduced
which implemented using mm/page_ext.c subsystem (see mm/page_vzext.c)
See ms commits:
0d1c2072 ("mm: memcontrol: switch to native NR_FILE_PAGES and NR_SHMEM counters")
3fea5a49 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API")
https://jira.sw.ru/browse/PSBM-131957
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn at virtuozzo.com>
khorenko@:
v2:
1. hunk
===
done_restock:
+ if (cache_charge)
+ page_counter_charge(&memcg->cache, batch);
+
===
is moved to later commit ("mm/memcg: Use per-cpu stock charges for
->cache counter")
2. "cache" field in struct mem_cgroup has been moved out of ifdef
3. copyright added to include/linux/page_vzext.h
v3: define mem_cgroup_charge_cache() for !CONFIG_MEMCG case
(cherry picked from commit 923c3f6d0c71499affd6fe2741aa7e2dcc565efa)
===+++
mm/memcg: reclaim memory.cache.limit_in_bytes from background
Reclaiming memory above memory.cache.limit_in_bytes always in direct
reclaim mode adds to much of a cost for vstorage. Instead of direct
reclaim allow to overflow memory.cache.limit_in_bytes but launch
the reclaim in background task.
https://pmc.acronis.com/browse/VSTOR-24395
https://jira.sw.ru/browse/PSBM-94761
Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>
(cherry picked from commit c7235680e58c0d7d792e8f47264ef233d2752b0b)
see ms 1a3e1f40 ("mm: memcontrol: decouple reference counting from page accounting")
https://jira.sw.ru/browse/PSBM-131957
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn at virtuozzo.com>
===+++
mm/memcg: fix cache growth above cache.limit_in_bytes
Exceeding cache above cache.limit_in_bytes schedules high_work_func()
which tries to reclaim 32 pages. If cache generated fast enough or it allows
cgroup to steadily grow above cache.limit_in_bytes because we don't reclaim
enough. Try to reclaim exceeded amount of cache instead.
https://jira.sw.ru/browse/PSBM-106384
Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>
(cherry picked from commit 098f6a9add74a10848494427046cb8087ceb27d1)
https://jira.sw.ru/browse/PSBM-131957
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn at virtuozzo.com>
===+++
mm/memcg: Use per-cpu stock charges for ->cache counter
Currently we use per-cpu stocks to do precharges of the ->memory and ->memsw
counters. Do this for the ->kmem and ->cache as well to decrease contention
on these counters as well.
https://jira.sw.ru/browse/PSBM-101300
Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>
(cherry picked from commit e1ae7b88d380d24a6df7c9b34635346726de39e3)
Original title:
mm/memcg: Use per-cpu stock charges for ->kmem and ->cache counters #PSBM-101300
Reworked:
kmem part was dropped because looks like this percpu charging functionallity
was covered by ms commit (see below).
see ms:
bf4f0599 ("mm: memcg/slab: obj_cgroup API")
e1a366be ("mm: memcontrol: switch to rcu protection in drain_all_stock()")
1a3e1f40 ("mm: memcontrol: decouple reference counting from page accounting")
https://jira.sw.ru/browse/PSBM-131957
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn at virtuozzo.com>
===+++
Reworked @amikhalitsyn:
1. Combined all fuxups
120d68a2a mm/memcg: Use per-cpu stock charges for ->cache counter
3cc18f4f2 mm/memcg: fix cache growth above cache.limit_in_bytes
83677c3a3 mm/memcg: reclaim memory.cache.limit_in_bytes from background
to simplify feature porting in the future
2. added new RO file "memory.cache.usage_in_bytes" which allows to check
how many page cache was charged
3. rebased to RH9 kernel
See also:
18b2db3b03 ("mm: Convert page kmemcg type to a page memcg flag")
TODO for @amikhalitsyn:
take a look on "enum page_memcg_data_flags". It's worth to try use it as
a storage for "page is page cache" flag instead of using external page extensions.
===================================
Simple test:
dd if=/dev/random of=testfile.bin bs=1M count=1000
mkdir /sys/fs/cgroup/memory/pagecache_limiter
tee /sys/fs/cgroup/memory/pagecache_limiter/memory.cache.limit_in_bytes <<< $[2**24]
bash
echo $$ > /sys/fs/cgroup/memory/pagecache_limiter/tasks
cat /sys/fs/cgroup/memory/pagecache_limiter/memory.cache.usage_in_bytes
time wc -l testfile.bin
cat /sys/fs/cgroup/memory/pagecache_limiter/memory.cache.usage_in_bytes
echo 3 > /proc/sys/vm/drop_caches
cat /sys/fs/cgroup/memory/pagecache_limiter/memory.cache.usage_in_bytes
===================================
https://jira.sw.ru/browse/PSBM-134013
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn at virtuozzo.com>
---
include/linux/memcontrol.h | 9 ++
include/linux/page_vzflags.h | 37 ++++++
mm/filemap.c | 2 +-
mm/memcontrol.c | 249 ++++++++++++++++++++++++++++-------
4 files changed, 250 insertions(+), 47 deletions(-)
create mode 100644 include/linux/page_vzflags.h
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d56d77da80f9..7b07e3d01c14 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -255,6 +255,7 @@ struct mem_cgroup {
/* Legacy consumer-oriented counters */
struct page_counter kmem; /* v1 only */
struct page_counter tcpmem; /* v1 only */
+ struct page_counter cache;
/* Range enforcement for interrupt charges */
struct work_struct high_work;
@@ -716,6 +717,8 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg)
}
int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask);
+int mem_cgroup_charge_cache(struct page *page, struct mm_struct *mm,
+ gfp_t gfp_mask);
int mem_cgroup_swapin_charge_page(struct page *page, struct mm_struct *mm,
gfp_t gfp, swp_entry_t entry);
void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry);
@@ -1246,6 +1249,12 @@ static inline int mem_cgroup_charge(struct page *page, struct mm_struct *mm,
return 0;
}
+static inline int mem_cgroup_charge_cache(struct page *page, struct mm_struct *mm,
+ gfp_t gfp_mask)
+{
+ return 0;
+}
+
static inline int mem_cgroup_swapin_charge_page(struct page *page,
struct mm_struct *mm, gfp_t gfp, swp_entry_t entry)
{
diff --git a/include/linux/page_vzflags.h b/include/linux/page_vzflags.h
new file mode 100644
index 000000000000..d98e4ac619a7
--- /dev/null
+++ b/include/linux/page_vzflags.h
@@ -0,0 +1,37 @@
+/*
+ * include/linux/page_vzflags.h
+ *
+ * Copyright (c) 2021 Virtuozzo International GmbH. All rights reserved.
+ *
+ */
+
+#ifndef __LINUX_PAGE_VZFLAGS_H
+#define __LINUX_PAGE_VZFLAGS_H
+
+#include <linux/page_vzext.h>
+#include <linux/page-flags.h>
+
+enum vzpageflags {
+ PGVZ_pagecache,
+};
+
+#define TESTVZPAGEFLAG(uname, lname) \
+static __always_inline int PageVz##uname(struct page *page) \
+ { return get_page_vzext(page) && test_bit(PGVZ_##lname, &get_page_vzext(page)->vzflags); }
+
+#define SETVZPAGEFLAG(uname, lname) \
+static __always_inline void SetVzPage##uname(struct page *page) \
+ { if (get_page_vzext(page)) set_bit(PGVZ_##lname, &get_page_vzext(page)->vzflags); }
+
+#define CLEARVZPAGEFLAG(uname, lname) \
+static __always_inline void ClearVzPage##uname(struct page *page) \
+ { if (get_page_vzext(page)) clear_bit(PGVZ_##lname, &get_page_vzext(page)->vzflags); }
+
+#define VZPAGEFLAG(uname, lname) \
+ TESTVZPAGEFLAG(uname, lname) \
+ SETVZPAGEFLAG(uname, lname) \
+ CLEARVZPAGEFLAG(uname, lname)
+
+VZPAGEFLAG(PageCache, pagecache)
+
+#endif /* __LINUX_PAGE_VZFLAGS_H */
diff --git a/mm/filemap.c b/mm/filemap.c
index a5cedb2bce8b..34fb79766902 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -874,7 +874,7 @@ noinline int __add_to_page_cache_locked(struct page *page,
page->index = offset;
if (!huge) {
- error = mem_cgroup_charge(page, NULL, gfp);
+ error = mem_cgroup_charge_cache(page, NULL, gfp);
if (error)
goto error;
charged = true;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 995e41ab3227..89ead3df0b59 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -36,6 +36,7 @@
#include <linux/vm_event_item.h>
#include <linux/smp.h>
#include <linux/page-flags.h>
+#include <linux/page_vzflags.h>
#include <linux/backing-dev.h>
#include <linux/bit_spinlock.h>
#include <linux/rcupdate.h>
@@ -215,6 +216,7 @@ enum res_type {
_OOM_TYPE,
_KMEM,
_TCP,
+ _CACHE,
};
#define MEMFILE_PRIVATE(x, val) ((x) << 16 | (val))
@@ -2158,6 +2160,7 @@ struct memcg_stock_pcp {
struct obj_stock task_obj;
struct obj_stock irq_obj;
+ unsigned int cache_nr_pages;
struct work_struct work;
unsigned long flags;
#define FLUSHING_CACHED_CHARGE 0
@@ -2227,7 +2230,8 @@ static inline void put_obj_stock(unsigned long flags)
*
* returns true if successful, false otherwise.
*/
-static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
+static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
+ bool cache)
{
struct memcg_stock_pcp *stock;
unsigned long flags;
@@ -2239,9 +2243,16 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
local_irq_save(flags);
stock = this_cpu_ptr(&memcg_stock);
- if (memcg == stock->cached && stock->nr_pages >= nr_pages) {
- stock->nr_pages -= nr_pages;
- ret = true;
+ if (memcg == stock->cached) {
+ if (cache && stock->cache_nr_pages >= nr_pages) {
+ stock->cache_nr_pages -= nr_pages;
+ ret = true;
+ }
+
+ if (!cache && stock->nr_pages >= nr_pages) {
+ stock->nr_pages -= nr_pages;
+ ret = true;
+ }
}
local_irq_restore(flags);
@@ -2255,15 +2266,20 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
static void drain_stock(struct memcg_stock_pcp *stock)
{
struct mem_cgroup *old = stock->cached;
+ unsigned long nr_pages = stock->nr_pages + stock->cache_nr_pages;
if (!old)
return;
- if (stock->nr_pages) {
- page_counter_uncharge(&old->memory, stock->nr_pages);
+ if (stock->cache_nr_pages)
+ page_counter_uncharge(&old->cache, stock->cache_nr_pages);
+
+ if (nr_pages) {
+ page_counter_uncharge(&old->memory, nr_pages);
if (do_memsw_account())
- page_counter_uncharge(&old->memsw, stock->nr_pages);
+ page_counter_uncharge(&old->memsw, nr_pages);
stock->nr_pages = 0;
+ stock->cache_nr_pages = 0;
}
css_put(&old->css);
@@ -2295,10 +2311,12 @@ static void drain_local_stock(struct work_struct *dummy)
* Cache charges(val) to local per_cpu area.
* This will be consumed by consume_stock() function, later.
*/
-static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
+static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages,
+ bool cache)
{
struct memcg_stock_pcp *stock;
unsigned long flags;
+ unsigned long stock_nr_pages;
local_irq_save(flags);
@@ -2308,9 +2326,14 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
css_get(&memcg->css);
stock->cached = memcg;
}
- stock->nr_pages += nr_pages;
- if (stock->nr_pages > MEMCG_CHARGE_BATCH)
+ if (cache)
+ stock->cache_nr_pages += nr_pages;
+ else
+ stock->nr_pages += nr_pages;
+
+ stock_nr_pages = stock->nr_pages + stock->cache_nr_pages;
+ if (stock_nr_pages > MEMCG_CHARGE_BATCH)
drain_stock(stock);
local_irq_restore(flags);
@@ -2338,10 +2361,12 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
struct mem_cgroup *memcg;
bool flush = false;
+ unsigned long nr_pages = stock->nr_pages +
+ stock->cache_nr_pages;
rcu_read_lock();
memcg = stock->cached;
- if (memcg && stock->nr_pages &&
+ if (memcg && nr_pages &&
mem_cgroup_is_descendant(memcg, root_memcg))
flush = true;
if (obj_stock_flush_required(stock, root_memcg))
@@ -2405,17 +2430,27 @@ static unsigned long reclaim_high(struct mem_cgroup *memcg,
do {
unsigned long pflags;
+ long cache_overused;
- if (page_counter_read(&memcg->memory) <=
- READ_ONCE(memcg->memory.high))
- continue;
+ if (page_counter_read(&memcg->memory) >
+ READ_ONCE(memcg->memory.high)) {
+ memcg_memory_event(memcg, MEMCG_HIGH);
+
+ psi_memstall_enter(&pflags);
+ nr_reclaimed += try_to_free_mem_cgroup_pages(memcg,
+ nr_pages, gfp_mask, true);
+ psi_memstall_leave(&pflags);
+ }
- memcg_memory_event(memcg, MEMCG_HIGH);
+ cache_overused = page_counter_read(&memcg->cache) -
+ memcg->cache.max;
- psi_memstall_enter(&pflags);
- nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, nr_pages,
- gfp_mask, true);
- psi_memstall_leave(&pflags);
+ if (cache_overused > 0) {
+ psi_memstall_enter(&pflags);
+ nr_reclaimed += try_to_free_mem_cgroup_pages(memcg,
+ cache_overused, gfp_mask, false);
+ psi_memstall_leave(&pflags);
+ }
} while ((memcg = parent_mem_cgroup(memcg)) &&
!mem_cgroup_is_root(memcg));
@@ -2651,7 +2686,7 @@ void mem_cgroup_handle_over_high(void)
}
static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
- unsigned int nr_pages)
+ unsigned int nr_pages, bool cache_charge)
{
unsigned int batch = max(MEMCG_CHARGE_BATCH, nr_pages);
int nr_retries = MAX_RECLAIM_RETRIES;
@@ -2664,8 +2699,8 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
unsigned long pflags;
retry:
- if (consume_stock(memcg, nr_pages))
- return 0;
+ if (consume_stock(memcg, nr_pages, cache_charge))
+ goto done;
if (!do_memsw_account() ||
page_counter_try_charge(&memcg->memsw, batch, &counter)) {
@@ -2790,13 +2825,19 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
page_counter_charge(&memcg->memory, nr_pages);
if (do_memsw_account())
page_counter_charge(&memcg->memsw, nr_pages);
+ if (cache_charge)
+ page_counter_charge(&memcg->cache, nr_pages);
return 0;
done_restock:
+ if (cache_charge)
+ page_counter_charge(&memcg->cache, batch);
+
if (batch > nr_pages)
- refill_stock(memcg, batch - nr_pages);
+ refill_stock(memcg, batch - nr_pages, cache_charge);
+done:
/*
* If the hierarchy is above the normal consumption range, schedule
* reclaim on returning to userland. We can perform reclaim here
@@ -2836,6 +2877,9 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
current->memcg_nr_pages_over_high += batch;
set_notify_resume(current);
break;
+ } else if (page_counter_read(&memcg->cache) > memcg->cache.max) {
+ if (!work_pending(&memcg->high_work))
+ schedule_work(&memcg->high_work);
}
} while ((memcg = parent_mem_cgroup(memcg)));
@@ -2843,12 +2887,12 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
}
static inline int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
- unsigned int nr_pages)
+ unsigned int nr_pages, bool cache_charge)
{
if (mem_cgroup_is_root(memcg))
return 0;
- return try_charge_memcg(memcg, gfp_mask, nr_pages);
+ return try_charge_memcg(memcg, gfp_mask, nr_pages, cache_charge);
}
#if defined(CONFIG_MEMCG_KMEM) || defined(CONFIG_MMU)
@@ -3064,7 +3108,7 @@ static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
page_counter_uncharge(&memcg->kmem, nr_pages);
- refill_stock(memcg, nr_pages);
+ refill_stock(memcg, nr_pages, false);
css_put(&memcg->css);
}
@@ -3086,7 +3130,7 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
memcg = get_mem_cgroup_from_objcg(objcg);
- ret = try_charge_memcg(memcg, gfp, nr_pages);
+ ret = try_charge_memcg(memcg, gfp, nr_pages, false);
if (ret)
goto out;
@@ -3384,7 +3428,7 @@ int memcg_charge_kmem(struct mem_cgroup *memcg, gfp_t gfp,
{
int ret = 0;
- ret = try_charge(memcg, gfp, nr_pages);
+ ret = try_charge(memcg, gfp, nr_pages, false);
if (!ret)
page_counter_charge(&memcg->kmem, nr_pages);
@@ -3743,6 +3787,9 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css,
case _TCP:
counter = &memcg->tcpmem;
break;
+ case _CACHE:
+ counter = &memcg->cache;
+ break;
default:
BUG();
}
@@ -3905,6 +3952,43 @@ static int memcg_update_tcp_max(struct mem_cgroup *memcg, unsigned long max)
return ret;
}
+static int memcg_update_cache_max(struct mem_cgroup *memcg,
+ unsigned long limit)
+{
+ unsigned long nr_pages;
+ bool enlarge = false;
+ int ret;
+
+ do {
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ break;
+ }
+ mutex_lock(&memcg_max_mutex);
+
+ if (limit > memcg->cache.max)
+ enlarge = true;
+
+ ret = page_counter_set_max(&memcg->cache, limit);
+ mutex_unlock(&memcg_max_mutex);
+
+ if (!ret)
+ break;
+
+ nr_pages = max_t(long, 1, page_counter_read(&memcg->cache) - limit);
+ if (!try_to_free_mem_cgroup_pages(memcg, nr_pages,
+ GFP_KERNEL, false)) {
+ ret = -EBUSY;
+ break;
+ }
+ } while (1);
+
+ if (!ret && enlarge)
+ memcg_oom_recover(memcg);
+
+ return ret;
+}
+
/*
* The user of this function is...
* RES_LIMIT.
@@ -3943,6 +4027,9 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of,
case _TCP:
ret = memcg_update_tcp_max(memcg, nr_pages);
break;
+ case _CACHE:
+ ret = memcg_update_cache_max(memcg, nr_pages);
+ break;
}
break;
case RES_SOFT_LIMIT:
@@ -3972,6 +4059,9 @@ static ssize_t mem_cgroup_reset(struct kernfs_open_file *of, char *buf,
case _TCP:
counter = &memcg->tcpmem;
break;
+ case _CACHE:
+ counter = &memcg->cache;
+ break;
default:
BUG();
}
@@ -5594,6 +5684,17 @@ static struct cftype mem_cgroup_legacy_files[] = {
{
.name = "pressure_level",
},
+ {
+ .name = "cache.limit_in_bytes",
+ .private = MEMFILE_PRIVATE(_CACHE, RES_LIMIT),
+ .write = mem_cgroup_write,
+ .read_u64 = mem_cgroup_read_u64,
+ },
+ {
+ .name = "cache.usage_in_bytes",
+ .private = MEMFILE_PRIVATE(_CACHE, RES_USAGE),
+ .read_u64 = mem_cgroup_read_u64,
+ },
#ifdef CONFIG_NUMA
{
.name = "numa_stat",
@@ -5907,11 +6008,13 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
page_counter_init(&memcg->swap, &parent->swap);
page_counter_init(&memcg->kmem, &parent->kmem);
page_counter_init(&memcg->tcpmem, &parent->tcpmem);
+ page_counter_init(&memcg->cache, &parent->cache);
} else {
page_counter_init(&memcg->memory, NULL);
page_counter_init(&memcg->swap, NULL);
page_counter_init(&memcg->kmem, NULL);
page_counter_init(&memcg->tcpmem, NULL);
+ page_counter_init(&memcg->cache, NULL);
root_mem_cgroup = memcg;
return &memcg->css;
@@ -6032,6 +6135,7 @@ static void mem_cgroup_css_reset(struct cgroup_subsys_state *css)
page_counter_set_max(&memcg->swap, PAGE_COUNTER_MAX);
page_counter_set_max(&memcg->kmem, PAGE_COUNTER_MAX);
page_counter_set_max(&memcg->tcpmem, PAGE_COUNTER_MAX);
+ page_counter_set_max(&memcg->cache, PAGE_COUNTER_MAX);
page_counter_set_min(&memcg->memory, 0);
page_counter_set_low(&memcg->memory, 0);
page_counter_set_high(&memcg->memory, PAGE_COUNTER_MAX);
@@ -6103,7 +6207,8 @@ static int mem_cgroup_do_precharge(unsigned long count)
int ret;
/* Try a single bulk charge without reclaim first, kswapd may wake */
- ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count);
+ ret = try_charge(mc.to, GFP_KERNEL & ~__GFP_DIRECT_RECLAIM, count,
+ false);
if (!ret) {
mc.precharge += count;
return ret;
@@ -6111,7 +6216,7 @@ static int mem_cgroup_do_precharge(unsigned long count)
/* Try charges one by one with reclaim, but do not retry */
while (count--) {
- ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1);
+ ret = try_charge(mc.to, GFP_KERNEL | __GFP_NORETRY, 1, false);
if (ret)
return ret;
mc.precharge++;
@@ -7333,18 +7438,30 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root,
}
static int __mem_cgroup_charge(struct page *page, struct mem_cgroup *memcg,
- gfp_t gfp)
+ gfp_t gfp, bool cache_charge)
{
unsigned int nr_pages = thp_nr_pages(page);
int ret;
- ret = try_charge(memcg, gfp, nr_pages);
+ ret = try_charge(memcg, gfp, nr_pages, cache_charge);
if (ret)
goto out;
css_get(&memcg->css);
commit_charge(page, memcg);
+ /*
+ * Here we set extended flag (see page_vzflags.c)
+ * on page which indicates that page is charged as
+ * a "page cache" page.
+ *
+ * We always cleanup this flag on uncharging, it means
+ * that during charging page we shoudn't have this flag set.
+ */
+ BUG_ON(PageVzPageCache(page));
+ if (cache_charge)
+ SetVzPagePageCache(page);
+
local_irq_disable();
mem_cgroup_charge_statistics(memcg, page, nr_pages);
memcg_check_events(memcg, page);
@@ -7353,6 +7470,22 @@ static int __mem_cgroup_charge(struct page *page, struct mem_cgroup *memcg,
return ret;
}
+static int __mem_cgroup_charge_gen(struct page *page, struct mm_struct *mm,
+ gfp_t gfp_mask, bool cache_charge)
+{
+ struct mem_cgroup *memcg;
+ int ret;
+
+ if (mem_cgroup_disabled())
+ return 0;
+
+ memcg = get_mem_cgroup_from_mm(mm);
+ ret = __mem_cgroup_charge(page, memcg, gfp_mask, cache_charge);
+ css_put(&memcg->css);
+
+ return ret;
+}
+
/**
* mem_cgroup_charge - charge a newly allocated page to a cgroup
* @page: page to charge
@@ -7369,17 +7502,12 @@ static int __mem_cgroup_charge(struct page *page, struct mem_cgroup *memcg,
*/
int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
{
- struct mem_cgroup *memcg;
- int ret;
-
- if (mem_cgroup_disabled())
- return 0;
-
- memcg = get_mem_cgroup_from_mm(mm);
- ret = __mem_cgroup_charge(page, memcg, gfp_mask);
- css_put(&memcg->css);
+ return __mem_cgroup_charge_gen(page, mm, gfp_mask, false);
+}
- return ret;
+int mem_cgroup_charge_cache(struct page *page, struct mm_struct *mm, gfp_t gfp_mask)
+{
+ return __mem_cgroup_charge_gen(page, mm, gfp_mask, true);
}
/**
@@ -7411,7 +7539,7 @@ int mem_cgroup_swapin_charge_page(struct page *page, struct mm_struct *mm,
memcg = get_mem_cgroup_from_mm(mm);
rcu_read_unlock();
- ret = __mem_cgroup_charge(page, memcg, gfp);
+ ret = __mem_cgroup_charge(page, memcg, gfp, false);
css_put(&memcg->css);
return ret;
@@ -7455,6 +7583,7 @@ struct uncharge_gather {
unsigned long nr_memory;
unsigned long pgpgout;
unsigned long nr_kmem;
+ unsigned long nr_pgcache;
struct page *dummy_page;
};
@@ -7473,6 +7602,9 @@ static void uncharge_batch(const struct uncharge_gather *ug)
page_counter_uncharge(&ug->memcg->memsw, ug->nr_memory);
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && ug->nr_kmem)
page_counter_uncharge(&ug->memcg->kmem, ug->nr_kmem);
+ if (ug->nr_pgcache)
+ page_counter_uncharge(&ug->memcg->cache, ug->nr_pgcache);
+
memcg_oom_recover(ug->memcg);
}
@@ -7535,6 +7667,16 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug)
page->memcg_data = 0;
obj_cgroup_put(objcg);
} else {
+ if (PageVzPageCache(page)) {
+ ug->nr_pgcache += nr_pages;
+ /*
+ * If we are here, it means that page *will* be
+ * uncharged anyway. We can safely clean
+ * "page is charged as a page cache" flag here.
+ */
+ ClearVzPagePageCache(page);
+ }
+
/* LRU pages aren't accounted at the root level */
if (!mem_cgroup_is_root(memcg))
ug->nr_memory += nr_pages;
@@ -7633,6 +7775,21 @@ void mem_cgroup_migrate(struct page *oldpage, struct page *newpage)
page_counter_charge(&memcg->memsw, nr_pages);
}
+ /*
+ * copy_page_vzflags() called before mem_cgroup_migrate()
+ * in migrate_page_states (mm/migrate.c)
+ *
+ * Let's check that all fine with flags:
+ * from one point of view page cache pages is always
+ * not anonimous and not swap backed;
+ * from another point of view we must have
+ * PageVzPageCache(page) ext flag set.
+ */
+ WARN_ON((!PageAnon(newpage) && !PageSwapBacked(newpage)) !=
+ PageVzPageCache(newpage));
+ if (PageVzPageCache(newpage))
+ page_counter_charge(&memcg->cache, nr_pages);
+
css_get(&memcg->css);
commit_charge(newpage, memcg);
@@ -7704,10 +7861,10 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
mod_memcg_state(memcg, MEMCG_SOCK, nr_pages);
- if (try_charge(memcg, gfp_mask, nr_pages) == 0)
+ if (try_charge(memcg, gfp_mask, nr_pages, false) == 0)
return true;
- try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages);
+ try_charge(memcg, gfp_mask|__GFP_NOFAIL, nr_pages, false);
return false;
}
@@ -7725,7 +7882,7 @@ void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
mod_memcg_state(memcg, MEMCG_SOCK, -nr_pages);
- refill_stock(memcg, nr_pages);
+ refill_stock(memcg, nr_pages, false);
}
static int __init cgroup_memory(char *s)
--
2.31.1
More information about the Devel
mailing list