[Devel] [PATCH RHEL COMMIT] ve: Add interface for ve::clock_[monotonic|bootbased] adjustment

Konstantin Khorenko khorenko at virtuozzo.com
Mon Oct 4 16:35:05 MSK 2021


reverted

https://jira.sw.ru/browse/PSBM-134393

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team

On 01.10.2021 19:38, Konstantin Khorenko wrote:
> The commit is pushed to "branch-rh9-5.14.vz9.1.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
> after ark-5.14
> ------>
> commit e38514e2dacba61d6736325b26786cf49fe31eff
> Author: Cyrill Gorcunov <gorcunov at virtuozzo.com>
> Date:   Fri Oct 1 19:38:40 2021 +0300
> 
>      ve: Add interface for ve::clock_[monotonic|bootbased] adjustment
>      
>      This two members represent monotonic and bootbased clocks for
>      container's uptime. When container is in suspended state (or
>      moving to another node) we trest monotonic and bootbased
>      clocks as being stopped so we need to account delta time
>      on restore and adjust the members in subject.
>      
>      Moreover this timestamps are involved into posix-timers
>      setup so once application tries to setup monotonic clocks
>      after the restore (with absolute time specification) we
>      adjust the values as well.
>      
>      The application which migrate a container must fetch
>      the current settings from /sys/fs/cgroup/ve/$VE/ve.real_start_timespec
>      and /sys/fs/cgroup/ve/$VE/ve.start_timespec, then write them
>      back on the restore.
>      
>      https://jira.sw.ru/browse/PSBM-41311
>      https://jira.sw.ru/browse/PSBM-41406
>      
>      v2:
>       - use clock_[monotonic|bootbased] for cgroup entry names instead
>      
>      Original-by: Andrew Vagin <avagin at openvz.org>
>      Signed-off-by: Cyrill Gorcunov <gorcunov at virtuozzo.com>
>      
>      Reviewed-by: Vladimir Davydov <vdavydov at virtuozzo.com>
>      
>      (cherry picked from vz7 commit 43f4b0c752abd84aa1b346373d152941123d2446
>      ("ve: Add interface for @start_timespec and @real_start_timespec
>      adjustmen"))
>      
>      Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
>      
>      +++
>      ve/time: Limit values to write in ve::clock_[monotonic|bootbased]
>      
>      What do we mean when write a valie XXX into, say, ve::ve.clock_bootbased?
>      We mean that "up to now the CT worked for XXX secs/usecs already".
>      And we store the delta between Node "now" and XXX into ve->start_time_real.
>      
>      If the CT worked less than the current Node, ve->start_time_real will
>      contain positive value and we'll substitute it from Node's "now" each
>      time when we need to get the time since the CT start.
>      
>      If the CT worked longer than the current CT (say, CT has been migrated
>      from another HN), the stored delta will be negative and thus we'll "add"
>      more time for Node's "now".
>      
>      So then what do we want to limit?
>      1. Negative values written to ve::clock_[monotonic|bootbased].
>         Indeed we can hardly imagine that the CT has been started, but the
>         time since it's start is negative.
>      
>      2. A big positive value, so some time later when we read from
>         ve::clock_[monotonic|bootbased] we get an overflowed value.
>      
>      Both these checks are performed by timespec_valid_strict().
>      
>      mFixes: 25cab3041305 ("ve: Add interface for
>      ve::clock_[monotonic|bootbased] adjustment")
>      
>      Signed-off-by: Konstantin Khorenko <khorenko at virtuozzo.com>
>      
>      Reviewed-by: Kirill Tkhai <ktkhai at virtuozzo.com>
>      
>      Cherry-picked from vz8 commit ad5d9cc5fd62 ("ve: Add interface for
>      ve::clock_[monotonic|bootbased] adjustment")).
>      
>      Ported to timespec64.
>      Followed ve->real_start_time -> ve->start_boottime rename.
>      Followed ktime_get_boot_ns() -> ktime_get_boottime_ns() rename.
>      
>      Signed-off-by: Nikita Yushchenko <nikita.yushchenko at virtuozzo.com>
> ---
>   kernel/ve/ve.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 76 insertions(+)
> 
> diff --git a/kernel/ve/ve.c b/kernel/ve/ve.c
> index e3a07d4c9fe4..f3df12f8638b 100644
> --- a/kernel/ve/ve.c
> +++ b/kernel/ve/ve.c
> @@ -955,6 +955,68 @@ static ssize_t ve_os_release_write(struct kernfs_open_file *of, char *buf,
>   	return ret ? ret : nbytes;
>   }
>   
> +enum {
> +	VE_CF_CLOCK_MONOTONIC,
> +	VE_CF_CLOCK_BOOTBASED,
> +};
> +
> +static int ve_ts_read(struct seq_file *sf, void *v)
> +{
> +	struct ve_struct *ve = css_to_ve(seq_css(sf));
> +	struct timespec64 ts;
> +	u64 now, delta;
> +
> +	switch (seq_cft(sf)->private) {
> +		case VE_CF_CLOCK_MONOTONIC:
> +			now = ktime_get_ns();
> +			delta = ve->start_time;
> +			break;
> +		case VE_CF_CLOCK_BOOTBASED:
> +			now = ktime_get_boottime_ns();
> +			delta = ve->start_boottime;
> +			break;
> +		default:
> +			now = delta = 0;
> +			WARN_ON_ONCE(1);
> +			break;
> +	}
> +
> +	ts = ns_to_timespec64(now - delta);
> +	seq_printf(sf, "%lld %ld", ts.tv_sec, ts.tv_nsec);
> +	return 0;
> +}
> +
> +static ssize_t ve_ts_write(struct kernfs_open_file *of, char *buf,
> +			   size_t nbytes, loff_t off)
> +{
> +	struct ve_struct *ve = css_to_ve(of_css(of));
> +	struct timespec64 delta;
> +	u64 delta_ns, now, *target;
> +
> +	if (sscanf(buf, "%lld %ld", &delta.tv_sec, &delta.tv_nsec) != 2)
> +		return -EINVAL;
> +	if (!timespec64_valid_strict(&delta))
> +		return -EINVAL;
> +	delta_ns = timespec64_to_ns(&delta);
> +
> +	switch (of_cft(of)->private) {
> +		case VE_CF_CLOCK_MONOTONIC:
> +			now = ktime_get_ns();
> +			target = &ve->start_time;
> +			break;
> +		case VE_CF_CLOCK_BOOTBASED:
> +			now = ktime_get_boottime_ns();
> +			target = &ve->start_boottime;
> +			break;
> +		default:
> +			WARN_ON_ONCE(1);
> +			return -EINVAL;
> +	}
> +
> +	*target = now - delta_ns;
> +	return nbytes;
> +}
> +
>   static struct cftype ve_cftypes[] = {
>   
>   	{
> @@ -981,6 +1043,20 @@ static struct cftype ve_cftypes[] = {
>   		.read_u64		= ve_reatures_read,
>   		.write_u64		= ve_reatures_write,
>   	},
> +	{
> +		.name			= "clock_monotonic",
> +		.flags			= CFTYPE_NOT_ON_ROOT,
> +		.seq_show		= ve_ts_read,
> +		.write			= ve_ts_write,
> +		.private		= VE_CF_CLOCK_MONOTONIC,
> +	},
> +	{
> +		.name			= "clock_bootbased",
> +		.flags			= CFTYPE_NOT_ON_ROOT,
> +		.seq_show		= ve_ts_read,
> +		.write			= ve_ts_write,
> +		.private		= VE_CF_CLOCK_BOOTBASED,
> +	},
>   	{
>   		.name			= "netns_max_nr",
>   		.flags			= CFTYPE_NOT_ON_ROOT,
> .
> 


More information about the Devel mailing list