[Devel] [PATCH vz8 v2] mm/backing-dev: associate writeback with correct blkcg

Kirill Tkhai ktkhai at virtuozzo.com
Tue Jul 20 13:25:19 MSK 2021


On 20.07.2021 13:24, Konstantin Khorenko wrote:
> From: Andrey Zhadchenko <andrey.zhadchenko at virtuozzo.com>
> 
> Use cgroup_get_e_ve_css to get correct blkcg_css for writeback instances.
> 
> https://jira.sw.ru/browse/PSBM-131253
> 
> Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko at virtuozzo.com>
> Reviewed-by: Kirill Tkhai <ktkhai at virtuozzo.com>
> 
> v2:
> khorenko@: introduce a wrapper for getting blkcg_css from memcg_css.

Good idea
 
> ==========================
> mm/writeback: Adopt cgroup-v2 writeback (limit per-memcg dirty memory)
> 
> In cgroup-v1 all writeback IO is accounted to root blkcg by design. With
> cgroup-v2 it became possible to link memcg and blkcg, so writeback code
> was enhanced to
>  1) consider balancing dirty pages per memory cgroup
>  2) account writeback generated IO to blkcg
> 
> In vz7 writeback was balancing by means of beancounter cgroup. However we
> dropped it.
> 
> In vz8 @aryabinin tried to enable cgroup-v2 writeback with 5cc286c98ee20
> ("mm, cgroup, writeback: Enable per-cgroup writeback for v1 cgroup."),
> but cgroup_get_e_css(), which is used to find blkcg based on memcg,
> does not work well with cgroup-v1 and always returns root blkcg.
> 
> However we can implement a new function to associate blkcg with memcg via
> ve css_set.
> 
> Test results with 256M container without patch:
> ===============================================
>  # echo "253:22358 100000000" > /sys/fs/cgroup/blkio/machine.slice/1/blkio.throttle.write_bps_device
>  # vzctl exec 1 dd if=/dev/zero of=/test bs=1M count=1000
>  # 1048576000 bytes (1.0 GB, 1000 MiB) copied, 1.35522 s, 774 MB/s
> 
> Since dirty balancing is global, Container can dirty more than it's RAM
> and blkio limits are not respected.
> 
> With patch:
> ===========
>  # echo "253:22765 100000000" > /sys/fs/cgroup/blkio/machine.slice/1/blkio.throttle.write_bps_device
>  # vzctl exec 1 dd if=/dev/zero of=/test bs=1M count=1000
>  # 1048576000 bytes (1.0 GB, 1000 MiB) copied, 10.2267 s, 103 MB/s
> 
> Per-ve dirty balancing and throttling work as expected.
> 
> v2:
> Since ve->ve_ns is pointing to task nsproxy, it can be changed during ve
> lifetime. We already have a helper ve_get_init_css() that handles this
> case, so I decided to reuse it's code in new cgroup_get_e_ve_css().
> 
> Additionally I have added two patches that improve current code:
>  1) drop 'get' from css_get_local_root() name since get with css functions
>     usually results in taking reference
>  2) drop duplicate code and reuse css_local_root() helper in
>     ve_get_init_css()
> 
> Andrey Zhadchenko (4):
>   kernel/cgroup: rename css_get_local_root
>   kernel/ve: simplify ve_get_init_css
>   kernel/cgroup: implement cgroup_get_e_ve_css
>   mm/backing-dev: associate writeback with correct blkcg
> ---
>  mm/backing-dev.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index cc2a3c0e6ae5..d520101d4a60 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -547,6 +547,22 @@ static void cgwb_remove_from_bdi_list(struct bdi_writeback *wb)
>  	spin_unlock_irq(&cgwb_lock);
>  }
>  
> +static inline struct cgroup_subsys_state *
> +cgroup_get_e_css_virtialized(struct cgroup *cgroup,
> +			     struct cgroup_subsys *ss);
> +{
> +	struct cgroup_subsys_state *css;
> +
> +#ifdef CONFIG_VE
> +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> +		css = cgroup_get_e_ve_css(cgroup, ss);
> +	else
> +#else
> +		css = cgroup_get_e_css(cgroup, ss);
> +
> +	return css;
> +}
> +
>  static int cgwb_create(struct backing_dev_info *bdi,
>  		       struct cgroup_subsys_state *memcg_css, gfp_t gfp)
>  {
> @@ -559,7 +575,8 @@ static int cgwb_create(struct backing_dev_info *bdi,
>  	int ret = 0;
>  
>  	memcg = mem_cgroup_from_css(memcg_css);
> -	blkcg_css = cgroup_get_e_css(memcg_css->cgroup, &io_cgrp_subsys);
> +	blkcg_css = cgroup_get_e_css_virtialized(memcg_css->cgroup,
> +						 &io_cgrp_subsys);
>  	blkcg = css_to_blkcg(blkcg_css);
>  	memcg_cgwb_list = &memcg->cgwb_list;
>  	blkcg_cgwb_list = &blkcg->cgwb_list;
> @@ -683,8 +700,9 @@ struct bdi_writeback *wb_get_create(struct backing_dev_info *bdi,
>  			struct cgroup_subsys_state *blkcg_css;
>  
>  			/* see whether the blkcg association has changed */
> -			blkcg_css = cgroup_get_e_css(memcg_css->cgroup,
> -						     &io_cgrp_subsys);
> +			blkcg_css = cgroup_get_e_css_virtialized(
> +							memcg_css->cgroup,
> +							&io_cgrp_subsys);
>  			if (unlikely(wb->blkcg_css != blkcg_css ||
>  				     !wb_tryget(wb)))
>  				wb = NULL;
> 



More information about the Devel mailing list