[Devel] [PATCH RHEL8 COMMIT] sched: show CPU stats for a cgroup in cpu.proc.stat file

Konstantin Khorenko khorenko at virtuozzo.com
Mon Jul 12 15:34:15 MSK 2021


The commit is pushed to "branch-rh8-4.18.0-240.1.1.vz8.5.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-240.1.1.vz8.5.54
------>
commit 8c8d3208cdc91c7570cf52474190216940602b62
Author: Evgenii Shatokhin <eshatokhin at virtuozzo.com>
Date:   Mon Jul 12 15:34:15 2021 +0300

    sched: show CPU stats for a cgroup in cpu.proc.stat file
    
    To implement its policies, vcmmd needs stats for each CPU core used by a
    given container or VM, similar to what /proc/stat shows for the system
    as a whole. The VZ8 kernel already has VZCTL_GET_CPU_STAT ioctl to fetch
    CPU stats, however, only total CPU times, rather than per-core, seem to
    be obtained that way, which is not enough here.
    
    In VZ7, part of commit 33cf55658533 ("sched: use cpuacct->cpustat for showing
    cpu stats") added "cpu.proc.stat" file for each cgroup with "cpu" subsystem
    for that purpose. Data from both "cpu" and "cpuacct" subsystems were needed,
    but it was assumed that these subsystems were always mounted together, so
    a cgroup could have either both or none.
    
    This patch adds support for "cpu.proc.stat" to VZ8, building on top of
    cpu_cgroup_proc_stat() machinery, already ported here in commit
    90368f957e01 ("ve/sched/stat: Introduce functions to calculate vcpustat data").
    Same as in VZ7, both "cpu" and "cpuacct" are needed. The file belongs to "cpu"
    subsystem, for consistency with VZ7, so it gets "cpuacct" from the cgroup.
    
    rcu_read_lock/unlock and css_get/put are probably not needed here (the
    file that belongs to the cgroup is open at the moment, so the cgroup cannot
    go away, neither can "cpu" subsystem). However, they are here to keep
    code analysis tools happier and - for a theoretical scenario where "cpuacct"
    subsystem is somehow used independent on "cpu" subsystem.
    
    https://jira.sw.ru/browse/PSBM-101155
    
    Signed-off-by: Evgenii Shatokhin <eshatokhin at virtuozzo.com>
    Reviewed-by: Konstantin Khorenko <khorenko at virtuozzo.com>
---
 kernel/sched/core.c    |  6 ++++++
 kernel/sched/cpuacct.c | 30 ++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c2880cf6cf60..bdd3217c5cc8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7398,6 +7398,8 @@ int cpu_cgroup_proc_loadavg(struct cgroup_subsys_state *css,
 	return 0;
 }
 
+int cpu_cgroup_proc_stat_show(struct seq_file *sf, void *v);
+
 static struct cftype cpu_legacy_files[] = {
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	{
@@ -7446,6 +7448,10 @@ static struct cftype cpu_legacy_files[] = {
 		.write_u64 = cpu_rt_period_write_uint,
 	},
 #endif
+	{
+		.name = "proc.stat",
+		.seq_show = cpu_cgroup_proc_stat_show,
+	},
 	{ }	/* Terminate */
 };
 
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 33b6987700c2..a1522c878472 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -779,3 +779,33 @@ int cpu_cgroup_get_stat(struct cgroup_subsys_state *cpu_css,
 
 	return 0;
 }
+
+int cpu_cgroup_proc_stat_show(struct seq_file *sf, void *v)
+{
+	struct cgroup_subsys_state *cpu_css = seq_css(sf);
+	struct cgroup_subsys_state *cpuacct_css;
+	int ret;
+
+	/*
+	 * The cgroup the file is associated with should not disappear from
+	 * under us (the file is open, after all). Still, it won't hurt to
+	 * use RCU read-side lock as cgroup->subsys[] might need it.
+	 */
+	rcu_read_lock();
+	/*
+	 * Data from both 'cpu' and 'cpuacct' subsystems are needed. These
+	 * subsystems are often used together, but let us check if 'cpuacct'
+	 * is available for the cgroup, just in case.
+	 */
+	cpuacct_css = rcu_dereference(cpu_css->cgroup->subsys[cpuacct_cgrp_id]);
+	if (!cpuacct_css) {
+		rcu_read_unlock();
+		return -ENOENT;
+	}
+	css_get(cpuacct_css);
+	rcu_read_unlock();
+
+	ret = cpu_cgroup_proc_stat(cpu_css, cpuacct_css, sf);
+	css_put(cpuacct_css);
+	return ret;
+}


More information about the Devel mailing list