[Devel] [PATCH vz9 09/23] ve/posix-timers: reference ve monotonic clock from ve start (v2)
Nikita Yushchenko
nikita.yushchenko at virtuozzo.com
Fri Oct 1 18:53:17 MSK 2021
From: Kirill Tkhai <ktkhai at parallels.com>
So that CLOCK_MONOTONIC will be monotonic even if ve is migrated to
another hw node.
Note, translating ve <-> abs time in clock_settime and timer_settime is
not necessary because (1) clock_settime won't set monotonic clock and
(2) timer_gettime always returns relative time.
https://jira.sw.ru/browse/PSBM-13860
diff-posix_timers-reference-ct-monotonic-clock-from-ct-start
Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>
Acked-by: Pavel Emelyanov <xemul at parallels.com>
Signed-off-by: Kirill Tkhai <ktkhai at parallels.com>
+++
ve/posix-timers: reference ve monotonic clock from start in clock_nanosleep
This is an addition to diff-posix_timers-reference-ct-monotonic-clock-from-ct-start
Otherwise, apps that use sys_clock_nanosleep() to suspend their
execution can hang after ve migration.
diff-posix-timers-reference-ve-monotonic-clock-from-ve-start-in-clock_nanosleep
Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>
Acked-by: Konstantin Khlebnikov <khlebnikov at openvz.org>
Acked-by: Pavel Emelyanov <xemul at parallels.com>
Signed-off-by: Kirill Tkhai <ktkhai at parallels.com>
+++
timers: Port diff-ve-timers-convert-ve-monotonic-to-abs-time-when-setting-timerfd-2
Need this for docker, as sometimes systemd-tmpfiles-clean.timer inside
a PCS7 CT is spamming dbus with requests to start corresponding service.
And at the same time Docker tries to create cgroup for container and
attach it to hierarchies like memory and blkio.
That is because systemd timer was triggered with non-virtualized timerfd
using plain host clock but check that timer is successfull uses
virtualized clock_gettime and don't pass before proper(in-container)
timer activation. And timers charges again and again starts service
got in busy loop.
https://jira.sw.ru/browse/PSBM-34017
v2: move the stubs to ve.h
Port the following RH6 commit:
Author: Vladimir Davydov
Email: vdavydov at parallels.com
Subject: fs: convert ve monotonic to abs time when setting timerfd
Date: Fri, 15 Feb 2013 11:57:09 +0400
* [timers] corrected TFD_TIMER_ABSTIME timer handling,
the issue led to high cpu usage inside a Fedora 18 CT
by 'init' process (PSBM-18284)
Monotonic time inside container, as it can be obtained using various
system calls such as clock_gettime, is reported since start of the container,
not since start of the whole system. This was made in order to avoid time
issues while a container is migrated between different physical hosts, but this
also introduced a lot of problems in time- related system calls because
absolute monotonic time, which is in fact relative to container, passed to those
system calls must be converted to system-wide monotonic time, which is used by
kernel hrtimers.
One of those buggy system calls is timerfd_settime which accepts as an
argument absolute time if flag TFD_TIMER_ABSTIME is specified.
The patch fixes it by converting container monotonic time to system-
wide monotonic time using the monotonic_ve_to_abs() function, which was
introduced earlier and is now exported for that reason.
https://jira.sw.ru/browse/PSBM-18284
Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>
Signed-off-by: Pavel Tikhomirov <ptikhomirov at virtuozzo.com>
Signed-off-by: Kirill Tkhai <ktkhai at odin.com>
Reviewed-by: Vladimir Davydov <vdavydov at parallels.com>
(cherry picked from vz7 commit 869542c24c41c0578b47d2ef83cfa63427e0e5e1)
Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
+++
timers should not get negative argument
This patch fixes 25-sec delay on login into systemd based containers.
Userspace application can set timer for past
and expect that the timer will be expired immediately.
This can do not work as expected inside migrated containers.
Translated argument provided to timer can become negative,
and according timer will sleep a very long time.
https://jira.sw.ru/browse/PSBM-48475
CC: Vladimir Davydov <vdavydov at virtuozzo.com>
CC: Konstantin Khorenko <khorenko at virtuozzo.com>
Signed-off-by: Vasily Averin <vvs at virtuozzo.com>
Acked-by: Cyrill Gorcunov <gorcunov at openvz.org>
(cherry picked from vz7 commit a71fa19facb00472e47760255ab2e6fa16885732)
Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
(cherry-picked from vz8 commit 4dad672974ab ("ve/posix-timers: reference
ve monotonic clock from ve start (v2)"))
Signed-off-by: Nikita Yushchenko <nikita.yushchenko at virtuozzo.com>
---
fs/timerfd.c | 8 ++++--
include/linux/ve.h | 8 ++++++
kernel/time/posix-timers.c | 54 ++++++++++++++++++++++++++++++++++++--
3 files changed, 66 insertions(+), 4 deletions(-)
diff --git a/fs/timerfd.c b/fs/timerfd.c
index c5509d2448e3..4f782f3690d0 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -27,6 +27,7 @@
#include <linux/compat.h>
#include <linux/rcupdate.h>
#include <linux/time_namespace.h>
+#include <linux/ve.h>
struct timerfd_ctx {
union {
@@ -435,8 +436,8 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
return ufd;
}
-static int do_timerfd_settime(int ufd, int flags,
- const struct itimerspec64 *new,
+static int do_timerfd_settime(int ufd, int flags,
+ struct itimerspec64 *new,
struct itimerspec64 *old)
{
struct fd f;
@@ -500,6 +501,9 @@ static int do_timerfd_settime(int ufd, int flags,
/*
* Re-program the timer to the new value ...
*/
+ if ((flags & TFD_TIMER_ABSTIME) &&
+ (new->it_value.tv_sec || new->it_value.tv_nsec))
+ monotonic_ve_to_abs(ctx->clockid, &new->it_value);
ret = timerfd_setup(ctx, flags, new);
spin_unlock_irq(&ctx->wqh.lock);
diff --git a/include/linux/ve.h b/include/linux/ve.h
index 552fa577e2f9..19a590bc86d4 100644
--- a/include/linux/ve.h
+++ b/include/linux/ve.h
@@ -97,6 +97,9 @@ static inline struct ve_struct *css_to_ve(struct cgroup_subsys_state *css)
extern struct cgroup_subsys_state *ve_get_init_css(struct ve_struct *ve, int subsys_id);
+extern void monotonic_abs_to_ve(clockid_t which_clock, struct timespec64 *tp);
+extern void monotonic_ve_to_abs(clockid_t which_clock, struct timespec64 *tp);
+
#define ve_feature_set(ve, f) \
!!((ve)->features & VE_FEATURE_##f)
@@ -140,6 +143,11 @@ static inline struct cgroup *cgroup_get_ve_root1(struct cgroup *cgrp)
static inline int vz_security_family_check(struct net *net, int family, int type) { return 0; }
static inline int vz_security_protocol_check(struct net *net, int protocol) { return 0; }
+static inline void monotonic_abs_to_ve(clockid_t which_clock,
+ struct timespec64 *tp) { }
+static inline void monotonic_ve_to_abs(clockid_t which_clock,
+ struct timepsec64 *tp) { }
+
#endif /* CONFIG_VE */
#endif /* _LINUX_VE_H */
diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 7363f81dc31a..3f91756f5421 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -31,6 +31,7 @@
#include <linux/compat.h>
#include <linux/nospec.h>
#include <linux/time_namespace.h>
+#include <linux/ve.h>
#include "timekeeping.h"
#include "posix-timers.h"
@@ -56,6 +57,46 @@ static const struct k_clock * const posix_clocks[];
static const struct k_clock *clockid_to_kclock(const clockid_t id);
static const struct k_clock clock_realtime, clock_monotonic;
+#define clock_is_monotonic(which_clock) \
+ ((which_clock) == CLOCK_MONOTONIC || \
+ (which_clock) == CLOCK_MONOTONIC_RAW || \
+ (which_clock) == CLOCK_MONOTONIC_COARSE)
+
+#ifdef CONFIG_VE
+static struct timespec64 zero_time;
+
+void monotonic_abs_to_ve(clockid_t which_clock, struct timespec64 *tp)
+{
+ struct timespec64 offset;
+
+ if (!clock_is_monotonic(which_clock))
+ return;
+
+ offset = ns_to_timespec64(get_exec_env()->start_time);
+ set_normalized_timespec64(tp,
+ tp->tv_sec - offset.tv_sec,
+ tp->tv_nsec - offset.tv_nsec);
+}
+
+void monotonic_ve_to_abs(clockid_t which_clock, struct timespec64 *tp)
+{
+ struct timespec64 offset;
+
+ if (!clock_is_monotonic(which_clock))
+ return;
+
+ offset = ns_to_timespec64(get_exec_env()->start_time);
+ set_normalized_timespec64(tp,
+ tp->tv_sec + offset.tv_sec,
+ tp->tv_nsec + offset.tv_nsec);
+
+ if (timespec64_compare(tp, &zero_time) <= 0) {
+ tp->tv_sec = 0;
+ tp->tv_nsec = 1;
+ }
+}
+#endif
+
/*
* we assume that the new SIGEV_THREAD_ID shares no bits with the other
* SIGEV values. Here we put out an error if this assumption fails.
@@ -916,6 +957,9 @@ static int do_timer_settime(timer_t timer_id, int tmr_flags,
if (!timr)
return -EINVAL;
+ if ((flags & TIMER_ABSTIME) &&
+ (new_spec64->it_value.tv_sec || new_spec64->it_value.tv_nsec))
+ monotonic_ve_to_abs(timr->it_clock, &new_spec64->it_value);
kc = timr->kclock;
if (WARN_ON_ONCE(!kc || !kc->timer_set))
error = -EINVAL;
@@ -1091,6 +1135,7 @@ SYSCALL_DEFINE2(clock_gettime, const clockid_t, which_clock,
error = kc->clock_get_timespec(which_clock, &kernel_tp);
+ monotonic_abs_to_ve(which_clock, &kernel_tp);
if (!error && put_timespec64(&kernel_tp, tp))
error = -EFAULT;
@@ -1173,6 +1218,7 @@ SYSCALL_DEFINE2(clock_gettime32, clockid_t, which_clock,
err = kc->clock_get_timespec(which_clock, &ts);
+ monotonic_abs_to_ve(which_clock, &ts);
if (!err && put_old_timespec32(&ts, tp))
err = -EFAULT;
@@ -1259,8 +1305,10 @@ SYSCALL_DEFINE4(clock_nanosleep, const clockid_t, which_clock, int, flags,
if (!timespec64_valid(&t))
return -EINVAL;
- if (flags & TIMER_ABSTIME)
+ if (flags & TIMER_ABSTIME) {
+ monotonic_ve_to_abs(which_clock, &t);
rmtp = NULL;
+ }
current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE;
current->restart_block.nanosleep.rmtp = rmtp;
@@ -1286,8 +1334,10 @@ SYSCALL_DEFINE4(clock_nanosleep_time32, clockid_t, which_clock, int, flags,
if (!timespec64_valid(&t))
return -EINVAL;
- if (flags & TIMER_ABSTIME)
+ if (flags & TIMER_ABSTIME) {
+ monotonic_ve_to_abs(which_clock, &t);
rmtp = NULL;
+ }
current->restart_block.nanosleep.type = rmtp ? TT_COMPAT : TT_NONE;
current->restart_block.nanosleep.compat_rmtp = rmtp;
--
2.30.2
More information about the Devel
mailing list