[Devel] [PATCH RHEL COMMIT] ve: Add interface for ve::clock_[monotonic|bootbased] adjustment
Konstantin Khorenko
khorenko at virtuozzo.com
Mon Oct 4 16:35:05 MSK 2021
reverted
https://jira.sw.ru/browse/PSBM-134393
--
Best regards,
Konstantin Khorenko,
Virtuozzo Linux Kernel Team
On 01.10.2021 19:38, Konstantin Khorenko wrote:
> The commit is pushed to "branch-rh9-5.14.vz9.1.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
> after ark-5.14
> ------>
> commit e38514e2dacba61d6736325b26786cf49fe31eff
> Author: Cyrill Gorcunov <gorcunov at virtuozzo.com>
> Date: Fri Oct 1 19:38:40 2021 +0300
>
> ve: Add interface for ve::clock_[monotonic|bootbased] adjustment
>
> This two members represent monotonic and bootbased clocks for
> container's uptime. When container is in suspended state (or
> moving to another node) we trest monotonic and bootbased
> clocks as being stopped so we need to account delta time
> on restore and adjust the members in subject.
>
> Moreover this timestamps are involved into posix-timers
> setup so once application tries to setup monotonic clocks
> after the restore (with absolute time specification) we
> adjust the values as well.
>
> The application which migrate a container must fetch
> the current settings from /sys/fs/cgroup/ve/$VE/ve.real_start_timespec
> and /sys/fs/cgroup/ve/$VE/ve.start_timespec, then write them
> back on the restore.
>
> https://jira.sw.ru/browse/PSBM-41311
> https://jira.sw.ru/browse/PSBM-41406
>
> v2:
> - use clock_[monotonic|bootbased] for cgroup entry names instead
>
> Original-by: Andrew Vagin <avagin at openvz.org>
> Signed-off-by: Cyrill Gorcunov <gorcunov at virtuozzo.com>
>
> Reviewed-by: Vladimir Davydov <vdavydov at virtuozzo.com>
>
> (cherry picked from vz7 commit 43f4b0c752abd84aa1b346373d152941123d2446
> ("ve: Add interface for @start_timespec and @real_start_timespec
> adjustmen"))
>
> Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
>
> +++
> ve/time: Limit values to write in ve::clock_[monotonic|bootbased]
>
> What do we mean when write a valie XXX into, say, ve::ve.clock_bootbased?
> We mean that "up to now the CT worked for XXX secs/usecs already".
> And we store the delta between Node "now" and XXX into ve->start_time_real.
>
> If the CT worked less than the current Node, ve->start_time_real will
> contain positive value and we'll substitute it from Node's "now" each
> time when we need to get the time since the CT start.
>
> If the CT worked longer than the current CT (say, CT has been migrated
> from another HN), the stored delta will be negative and thus we'll "add"
> more time for Node's "now".
>
> So then what do we want to limit?
> 1. Negative values written to ve::clock_[monotonic|bootbased].
> Indeed we can hardly imagine that the CT has been started, but the
> time since it's start is negative.
>
> 2. A big positive value, so some time later when we read from
> ve::clock_[monotonic|bootbased] we get an overflowed value.
>
> Both these checks are performed by timespec_valid_strict().
>
> mFixes: 25cab3041305 ("ve: Add interface for
> ve::clock_[monotonic|bootbased] adjustment")
>
> Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
>
> Reviewed-by: Kirill Tkhai <ktkhai at virtuozzo.com>
>
> Cherry-picked from vz8 commit ad5d9cc5fd62 ("ve: Add interface for
> ve::clock_[monotonic|bootbased] adjustment")).
>
> Ported to timespec64.
> Followed ve->real_start_time -> ve->start_boottime rename.
> Followed ktime_get_boot_ns() -> ktime_get_boottime_ns() rename.
>
> Signed-off-by: Nikita Yushchenko <nikita.yushchenko at virtuozzo.com>
> ---
> kernel/ve/ve.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 76 insertions(+)
>
> diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c
> index e3a07d4c9fe4..f3df12f8638b 100644
> --- a/kernel/ve/ve.c
> +++ b/kernel/ve/ve.c
> @@ -955,6 +955,68 @@ static ssize_t ve_os_release_write(struct kernfs_open_file *of, char *buf,
> return ret ? ret : nbytes;
> }
>
> +enum {
> + VE_CF_CLOCK_MONOTONIC,
> + VE_CF_CLOCK_BOOTBASED,
> +};
> +
> +static int ve_ts_read(struct seq_file *sf, void *v)
> +{
> + struct ve_struct *ve = css_to_ve(seq_css(sf));
> + struct timespec64 ts;
> + u64 now, delta;
> +
> + switch (seq_cft(sf)->private) {
> + case VE_CF_CLOCK_MONOTONIC:
> + now = ktime_get_ns();
> + delta = ve->start_time;
> + break;
> + case VE_CF_CLOCK_BOOTBASED:
> + now = ktime_get_boottime_ns();
> + delta = ve->start_boottime;
> + break;
> + default:
> + now = delta = 0;
> + WARN_ON_ONCE(1);
> + break;
> + }
> +
> + ts = ns_to_timespec64(now - delta);
> + seq_printf(sf, "%lld %ld", ts.tv_sec, ts.tv_nsec);
> + return 0;
> +}
> +
> +static ssize_t ve_ts_write(struct kernfs_open_file *of, char *buf,
> + size_t nbytes, loff_t off)
> +{
> + struct ve_struct *ve = css_to_ve(of_css(of));
> + struct timespec64 delta;
> + u64 delta_ns, now, *target;
> +
> + if (sscanf(buf, "%lld %ld", &delta.tv_sec, &delta.tv_nsec) != 2)
> + return -EINVAL;
> + if (!timespec64_valid_strict(&delta))
> + return -EINVAL;
> + delta_ns = timespec64_to_ns(&delta);
> +
> + switch (of_cft(of)->private) {
> + case VE_CF_CLOCK_MONOTONIC:
> + now = ktime_get_ns();
> + target = &ve->start_time;
> + break;
> + case VE_CF_CLOCK_BOOTBASED:
> + now = ktime_get_boottime_ns();
> + target = &ve->start_boottime;
> + break;
> + default:
> + WARN_ON_ONCE(1);
> + return -EINVAL;
> + }
> +
> + *target = now - delta_ns;
> + return nbytes;
> +}
> +
> static struct cftype ve_cftypes[] = {
>
> {
> @@ -981,6 +1043,20 @@ static struct cftype ve_cftypes[] = {
> .read_u64 = ve_reatures_read,
> .write_u64 = ve_reatures_write,
> },
> + {
> + .name = "clock_monotonic",
> + .flags = CFTYPE_NOT_ON_ROOT,
> + .seq_show = ve_ts_read,
> + .write = ve_ts_write,
> + .private = VE_CF_CLOCK_MONOTONIC,
> + },
> + {
> + .name = "clock_bootbased",
> + .flags = CFTYPE_NOT_ON_ROOT,
> + .seq_show = ve_ts_read,
> + .write = ve_ts_write,
> + .private = VE_CF_CLOCK_BOOTBASED,
> + },
> {
> .name = "netns_max_nr",
> .flags = CFTYPE_NOT_ON_ROOT,
> .
>
More information about the Devel
mailing list