[Devel] [PATCH vz8 v2] mm/backing-dev: associate writeback with correct blkcg
Kirill Tkhai
ktkhai at virtuozzo.com
Tue Jul 20 13:25:19 MSK 2021
On 20.07.2021 13:24, Konstantin Khorenko wrote:
> From: Andrey Zhadchenko <andrey.zhadchenko at virtuozzo.com>
>
> Use cgroup_get_e_ve_css to get correct blkcg_css for writeback instances.
>
> https://jira.sw.ru/browse/PSBM-131253
>
> Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko at virtuozzo.com>
> Reviewed-by: Kirill Tkhai <ktkhai at virtuozzo.com>
>
> v2:
> khorenko@: introduce a wrapper for getting blkcg_css from memcg_css.
Good idea
> ==========================
> mm/writeback: Adopt cgroup-v2 writeback (limit per-memcg dirty memory)
>
> In cgroup-v1 all writeback IO is accounted to root blkcg by design. With
> cgroup-v2 it became possible to link memcg and blkcg, so writeback code
> was enhanced to
> 1) consider balancing dirty pages per memory cgroup
> 2) account writeback generated IO to blkcg
>
> In vz7 writeback was balancing by means of beancounter cgroup. However we
> dropped it.
>
> In vz8 @aryabinin tried to enable cgroup-v2 writeback with 5cc286c98ee20
> ("mm, cgroup, writeback: Enable per-cgroup writeback for v1 cgroup."),
> but cgroup_get_e_css(), which is used to find blkcg based on memcg,
> does not work well with cgroup-v1 and always returns root blkcg.
>
> However we can implement a new function to associate blkcg with memcg via
> ve css_set.
>
> Test results with 256M container without patch:
> ===============================================
> # echo "253:22358 100000000" > /sys/fs/cgroup/blkio/machine.slice/1/blkio.throttle.write_bps_device
> # vzctl exec 1 dd if=/dev/zero of=/test bs=1M count=1000
> # 1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.35522 s, 774 MB/s
>
> Since dirty balancing is global, Container can dirty more than it's RAM
> and blkio limits are not respected.
>
> With patch:
> ===========
> # echo "253:22765 100000000" > /sys/fs/cgroup/blkio/machine.slice/1/blkio.throttle.write_bps_device
> # vzctl exec 1 dd if=/dev/zero of=/test bs=1M count=1000
> # 1048576000 bytes (1.0 GB, 1000 MiB) copied, 10.2267 s, 103 MB/s
>
> Per-ve dirty balancing and throttling work as expected.
>
> v2:
> Since ve->ve_ns is pointing to task nsproxy, it can be changed during ve
> lifetime. We already have a helper ve_get_init_css() that handles this
> case, so I decided to reuse it's code in new cgroup_get_e_ve_css().
>
> Additionally I have added two patches that improve current code:
> 1) drop 'get' from css_get_local_root() name since get with css functions
> usually results in taking reference
> 2) drop duplicate code and reuse css_local_root() helper in
> ve_get_init_css()
>
> Andrey Zhadchenko (4):
> kernel/cgroup: rename css_get_local_root
> kernel/ve: simplify ve_get_init_css
> kernel/cgroup: implement cgroup_get_e_ve_css
> mm/backing-dev: associate writeback with correct blkcg
> ---
> mm/backing-dev.c | 24 +++++++++++++++++++++---
> 1 file changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index cc2a3c0e6ae5..d520101d4a60 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -547,6 +547,22 @@ static void cgwb_remove_from_bdi_list(struct bdi_writeback *wb)
> spin_unlock_irq(&cgwb_lock);
> }
>
> +static inline struct cgroup_subsys_state *
> +cgroup_get_e_css_virtialized(struct cgroup *cgroup,
> + struct cgroup_subsys *ss);
> +{
> + struct cgroup_subsys_state *css;
> +
> +#ifdef CONFIG_VE
> + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> + css = cgroup_get_e_ve_css(cgroup, ss);
> + else
> +#else
> + css = cgroup_get_e_css(cgroup, ss);
> +
> + return css;
> +}
> +
> static int cgwb_create(struct backing_dev_info *bdi,
> struct cgroup_subsys_state *memcg_css, gfp_t gfp)
> {
> @@ -559,7 +575,8 @@ static int cgwb_create(struct backing_dev_info *bdi,
> int ret = 0;
>
> memcg = mem_cgroup_from_css(memcg_css);
> - blkcg_css = cgroup_get_e_css(memcg_css->cgroup, &io_cgrp_subsys);
> + blkcg_css = cgroup_get_e_css_virtialized(memcg_css->cgroup,
> + &io_cgrp_subsys);
> blkcg = css_to_blkcg(blkcg_css);
> memcg_cgwb_list = &memcg->cgwb_list;
> blkcg_cgwb_list = &blkcg->cgwb_list;
> @@ -683,8 +700,9 @@ struct bdi_writeback *wb_get_create(struct backing_dev_info *bdi,
> struct cgroup_subsys_state *blkcg_css;
>
> /* see whether the blkcg association has changed */
> - blkcg_css = cgroup_get_e_css(memcg_css->cgroup,
> - &io_cgrp_subsys);
> + blkcg_css = cgroup_get_e_css_virtialized(
> + memcg_css->cgroup,
> + &io_cgrp_subsys);
> if (unlikely(wb->blkcg_css != blkcg_css ||
> !wb_tryget(wb)))
> wb = NULL;
>
More information about the Devel
mailing list