[Devel] [PATCH RHEL COMMIT] ve/time: Use ve_relative_clock in times() syscall and /proc/[pid]/stat

Konstantin Khorenko khorenko at virtuozzo.com
Mon Oct 4 21:53:18 MSK 2021


The commit is pushed to "branch-rh9-5.14.vz9.1.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after ark-5.14
------>
commit 6ba2efa60c663563227ea00cbdd6bd07cbd57725
Author: Kirill Tkhai <ktkhai at virtuozzo.com>
Date:   Mon Oct 4 21:53:18 2021 +0300

    ve/time: Use ve_relative_clock in times() syscall and /proc/[pid]/stat
    
    Extracted from "Initial patch".
    
    Signed-off-by: Kirill Tkhai <ktkhai at virtuozzo.com>
    
    (cherry picked from commit c7401d1672b6e50cb782da033c50a083c3d8371a)
    Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
    
    +++
    ve/proc/time: Port diff-ve-proc-report-real_start_time-in-_proc_PID_stat-if-CONFIG_VE
    
    Author: Vladimir Davydov
    Email: vdavydov at parallels.com
    Subject: proc: report real_start_time in /proc/PID/stat if CONFIG_VE
    Date: Mon, 14 Oct 2013 19:04:59 +0400
    
    In case !CONFIG_VE, real_start_time is reported, so should be if
    CONFIG_VE is on.
    
    The difference between start_time and real_start_time is that the former
    is the monotonic time of process start while the latter is bootbased,
    i.e. includes time the system was suspended and uptime from the previous
    boot in case the system was vzrebooted. Reporting start_time instead of
    real_start_time leads to a wrong process etime reported by ps after
    vzreboot or system suspend/resume.
    
    https://jira.sw.ru/browse/PSBM-22925
    
    Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>
    
    Acked-by: Stanislav Kinsbursky <skinsbursky at parallels.com>
    =============================================================================
    
    Author: Vladimir Davydov
    Email: vdavydov at parallels.com
    Subject: proc: fix negative start time in /proc/PID/stat
    Date: Mon, 14 Oct 2013 19:05:01 +0400
    
    Tasks inside a CT can have negative start time e.g. if the CT was
    migrated from another hw node. In this case we'd better report 0 in
    order not to confuse userspace and avoid warning trigger in
    nsec_to_clock_t().
    
    https://jira.sw.ru/browse/PSBM-22925
    
    Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>
    
    Acked-by: Stanislav Kinsbursky <skinsbursky at parallels.com>
    =============================================================================
    
    Related to https://jira.sw.ru/browse/PSBM-33650
    
    Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>
    
    (cherry picked from commit 7cc4ea4fdd6d0f25cbcfb8093e418f6e9647fcc1)
    Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
    
    +++
    ve/time: rework times() syscall and /proc/[pid]/stat to handle u64 time offsets
    
    ve_struct.{start_time,real_start_time} are u64 now, change the code
    correspondingly.
    
    Drop duplicated fields start_timespec/real_start_timespec in ve_struct.
    
    mFixes: f2716576136d ("ve/time: Use ve_relative_clock in times() syscall
    and /proc/[pid]/stat")
    
    Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
    
    (cherry picked from vz7 commit eca790eaed527bae7029b4ae1cd557ce847ac6c0)
    Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
    
    Reviewed-by: Valeriy Vdovin <valeriy.vdovin at virtuozzo.com>
    
    Changes vz9:
    - split from process start time virtualization
    - switch to time namespace
    
    (cherry picked from vz8 commit 222870c58a3b4a284698e8cf7a692f7fea577b13)
    Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
    
    ====================
    Patchset description:
    
    ve/time: switch from our ve-time to native timenamespace
    
    https://jira.sw.ru/browse/PSBM-134393
    
    As time-namespaces are a new and mainstreamed version of ve-time, it's
    time to switch to it.
    
    Notes:
    1) ve-time does not need configuration on start, though time namespace
       needs configuration (offset == -now).
    
    2) ve-time saved container start time but time namespaces save offset
       between host start time and container start time
       (offset == ve_start_time - now).
    
    3) criu already knows how to handle time namespaces, though we need to
       do a compatibility layer to convert our ve.clock_* to offsets in time
       namespace for pre-vz9 to vz9 migration.
    
    4) vdso time is already handled by time namespaces, though time
       namespace only virtualizes vvar page, so it should not intersect with
       our vdso virtualization for ve.os_release.
    
    https://jira.sw.ru/browse/PSBM-134393
    
    Cyrill Gorcunov (1):
      ve: Add interface for ve::clock_[monotonic|bootbased] adjustment
    
    Kirill Tkhai (2):
      ve/time: Use ve_relative_clock in times() syscall and /proc/[pid]/stat
      ve: Virtualize sysinfo
    
    Pavel Tikhomirov (1):
      ve/time: remove our per-ve times in favor of mainstream
        time-namespaces
    
    Valeriy Vdovin (1):
      ve/proc: Added separate start time field to task_struct to show in
        container
---
 include/linux/ve.h | 23 +++++++++++++++++++++++
 kernel/sys.c       | 18 ++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/include/linux/ve.h b/include/linux/ve.h
index 65439ab1302e..6e3975ed2cf6 100644
--- a/include/linux/ve.h
+++ b/include/linux/ve.h
@@ -122,6 +122,29 @@ static inline struct ve_struct *css_to_ve(struct cgroup_subsys_state *css)
 
 extern struct cgroup_subsys_state *ve_get_init_css(struct ve_struct *ve, int subsys_id);
 
+static inline u64 ve_get_monotonic(struct ve_struct *ve)
+{
+	struct timespec64 tp = ns_to_timespec64(0);
+	struct time_namespace *time_ns;
+	struct nsproxy *ve_ns;
+
+	rcu_read_lock();
+	ve_ns = rcu_dereference(ve->ve_ns);
+	if (!ve_ns) {
+		rcu_read_unlock();
+		goto out;
+	}
+
+	time_ns = get_time_ns(ve_ns->time_ns);
+	rcu_read_unlock();
+
+	ktime_get_ts64(&tp);
+	tp = timespec64_add(tp, time_ns->offsets.monotonic);
+	put_time_ns(time_ns);
+out:
+	return timespec64_to_ns(&tp);
+}
+
 static u64 ve_get_uptime(struct ve_struct *ve)
 {
 	struct timespec64 tp = ns_to_timespec64(0);
diff --git a/kernel/sys.c b/kernel/sys.c
index ae566d26ab6e..3d4b35e0e636 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -986,6 +986,19 @@ static void do_sys_times(struct tms *tms)
 	tms->tms_cstime = nsec_to_clock_t(cstime);
 }
 
+#ifdef CONFIG_VE
+static u64 ve_relative_clock(void)
+{
+	u64 ve_now = ve_get_monotonic(get_exec_env());
+
+	/* VE not started, fallback to host time */
+	if (!ve_now)
+		ve_now = ktime_get_ns();
+
+	return nsec_to_clock_t(ve_now);
+}
+#endif
+
 SYSCALL_DEFINE1(times, struct tms __user *, tbuf)
 {
 	if (tbuf) {
@@ -995,8 +1008,13 @@ SYSCALL_DEFINE1(times, struct tms __user *, tbuf)
 		if (copy_to_user(tbuf, &tmp, sizeof(struct tms)))
 			return -EFAULT;
 	}
+#ifndef CONFIG_VE
 	force_successful_syscall_return();
 	return (long) jiffies_64_to_clock_t(get_jiffies_64());
+#else
+	force_successful_syscall_return();
+	return (long) ve_relative_clock();
+#endif
 }
 
 #ifdef CONFIG_COMPAT


More information about the Devel mailing list