[Devel] [PATCH vz9 1/2] prctl: add option to manage memory allocation scopes

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Fri Apr 7 07:31:05 MSK 2023



On 05.04.2023 01:56, Alexander Atanasov wrote:
> Currently there is no way to hint the kernel to avoid triggering
> page reclaims. This is useful in networked file systems,
> which can deadlock in the synchronous reclaim path and to reduce
> jitter when streaming which can be induced by a synchronouse reclaim.
> 
> To aid the userspace add interface to manage PF_MEMALLOC, PF_MEMALLOC_NOIO,
> PF_MEMALLOC_NOFS, PF_MEMALLOC_PIN flags via prctl.
> 
> Interface is defined via option PR_MEMALLOC_FLAGS and respective
> PR_MEMALLOC_GET_FLAGS, PR_MEMALLOC_SET_FLAGS and PR_MEMALLOC_CLEAR_FLAGS.
> Flag values used are defined in the kernel header include/linux/sched.h.
> 
> https://jira.sw.ru/browse/PSBM-141577
> Signed-off-by: Alexander Atanasov <alexander.atanasov at virtuozzo.com>
> ---
>   include/uapi/linux/prctl.h |  6 ++++++
>   kernel/sys.c               | 33 +++++++++++++++++++++++++++++++++
>   2 files changed, 39 insertions(+)
> 
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index 4baf1c5b0be7..409bba71a92b 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -277,4 +277,10 @@ struct prctl_task_ct_fields {
>   	__s64 start_boottime;
>   };
>   
> +/* Set task memalloc flags */
> +#define PR_MEMALLOC_FLAGS			1001
> +#define PR_MEMALLOC_GET_FLAGS			1
> +#define PR_MEMALLOC_SET_FLAGS			2
> +#define PR_MEMALLOC_CLEAR_FLAGS			3
> +
>   #endif /* _LINUX_PRCTL_H */
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 54d7bc990e8f..170f179fa4e5 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2313,6 +2313,36 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which,
>   
>   #define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)
>   
> +#define MEMALLOC_FLAGS_MASK (PF_MEMALLOC | PF_MEMALLOC_NOFS | \
> +                             PF_MEMALLOC_NOIO | PF_MEMALLOC_PIN)
> +
> +static int prctl_memalloc_flags(int opt, unsigned long flags)
> +{
> +	unsigned int pflags;
> +
> +#ifdef CONFIG_VE
> +	if (!ve_is_super(get_exec_env()))
> +		return -ENOSYS;
> +#endif

Other, more generic, approach would be:

	if (!capable(CAP_SYS_ADMIN))

So that only processes with admin cap in init userns will be able to do 
it. We probably don't want to allow this feature to non root on host.

> +	switch(opt) {
> +		case PR_MEMALLOC_GET_FLAGS:
> +			return current->flags & MEMALLOC_FLAGS_MASK;
> +		case PR_MEMALLOC_SET_FLAGS:
> +			if (flags & ~MEMALLOC_FLAGS_MASK)
> +				return -EINVAL;
> +			pflags = current->flags & ~MEMALLOC_FLAGS_MASK;
> +			current->flags = pflags | flags;
> +			return current->flags;
> +		case PR_MEMALLOC_CLEAR_FLAGS:
> +			if (flags & ~MEMALLOC_FLAGS_MASK)
> +				return -EINVAL;
> +			current->flags &= ~flags;
> +			return current->flags;
> +	}
> +
> +	return -EINVAL;
> +}
> +
>   SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>   		unsigned long, arg4, unsigned long, arg5)
>   {
> @@ -2585,6 +2615,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>   	case PR_SET_TASK_CT_FIELDS:
>   		error = prctl_set_task_ct_fields(me, arg2, arg3);
>   		break;
> +	case PR_MEMALLOC_FLAGS:
> +		error = prctl_memalloc_flags(arg2, arg3);
> +		break;
>   	default:
>   		error = -EINVAL;
>   		break;

-- 
Best regards, Tikhomirov Pavel
Senior Software Developer, Virtuozzo.


More information about the Devel mailing list