[Devel] [PATCH 07/17] Port diff-sched-introduce-cond_resched_may_throttle

Kirill Tkhai ktkhai at odin.com
Thu Sep 3 01:56:57 PDT 2015



On 14.08.2015 20:03, Vladimir Davydov wrote:
> This patch reintroduces cond_resched_may_throttle(), which acts exactly
> as cond_resched() except it does not forbid CFS bandwidth induced
> throttling. It will be used in the following patch.
> 
> =============================================================================
> Author: Vladimir Davydov
> Email: vdavydov at parallels.com
> Subject: sched: introduce cond_resched_may_throttle
> Date: Fri, 15 Mar 2013 18:53:12 +0400
> 
> Since cond_resched() is sometimes called under a semaphore, it was
> forbidden to throttle tasks there in order to eliminate the possibility
> of the priority inversion problem. However, it turned out that some
> tasks must be throttled on cond_resched(), otherwise they won't have a
> chance to be throttled at all breaking the concept of CPU limits. The
> most notable (and currently the only identified) example is vm
> hypervisors such as KVM or Balalaika.
> 
> To fix this problem, the patch introduces the new function
> cond_resched_may_throttle(), which works just like usual cond_resched()
> except it allows the scheduler to throttle the caller's task group. This
> function must be used by those pieces of software that can only be
> throttled on cond_resched() under certain conditions. This function is
> to and will be used by Balalaika - I'm going to send the corresponding
> patch. Perhaps, it's also worth while using it in KVM, however there is
> no rush in it because I doubt anyone will use KVM, vzkernel, and our
> hacked CPU limits altogether so it can wait.
> 
> https://jira.sw.ru/browse/PSBM-18888
> 
> Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>
> =============================================================================
> 
> Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>
> ---
>  include/linux/sched.h |  8 ++++++++
>  kernel/sched/core.c   | 11 +++++++++++
>  kernel/sched/fair.c   |  3 ++-
>  3 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 65fedc487c92..3bd2f50822b8 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1245,6 +1245,7 @@ struct task_struct {
>  	unsigned sched_interruptible_sleep:1;
>  
>  	unsigned woken_while_running:1;
> +	unsigned may_throttle:1;
>  
>  	pid_t pid;
>  	pid_t tgid;
> @@ -2693,6 +2694,13 @@ extern int _cond_resched(void);
>  	_cond_resched();			\
>  })
>  
> +extern int __cond_resched_may_throttle(void);
> +
> +#define cond_resched_may_throttle() ({		\
> +	__might_sleep(__FILE__, __LINE__, 0);	\
> +	__cond_resched_may_throttle();		\
> +})
> +
>  extern int __cond_resched_lock(spinlock_t *lock);
>  
>  #ifdef CONFIG_PREEMPT_COUNT
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index bbb6fc3251ad..77d330cd2b79 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4790,6 +4790,17 @@ int __sched _cond_resched(void)
>  }
>  EXPORT_SYMBOL(_cond_resched);
>  
> +int __sched __cond_resched_may_throttle(void)
> +{
> +	if (should_resched()) {
> +		current->may_throttle = 1;
> +		__cond_resched();
> +		current->may_throttle = 0;
> +		return 1;
> +	}
> +	return 0;
> +}
> +
>  /*
>   * __cond_resched_lock() - if a reschedule is pending, drop the given lock,
>   * call schedule, and on return reacquire the lock.
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 167d0f66387b..49a93569b5e6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -965,7 +965,8 @@ static inline void update_entity_boost(struct sched_entity *se)
>  					p->woken_while_running;
>  			p->woken_while_running = 0;
>  		} else
> -			se->boosted = sched_feat(BOOST_PREEMPT);
> +			se->boosted = sched_feat(BOOST_PREEMPT) &&
> +					!p->may_throttle;

This function is unsafe, because there is case where put_prev_task() is made
before dequeue_task() in scheduler. This lead to we clear boosted flag and leave
se linked in boosted list, while it mustn't be there.

I suggest to unconditionally dequeue bootsted entities, where you do that,
without checking for entity_boosted(). Of course, you need to use list_del_init()
instead of plain list_del(). In this case you will be able to change se->boosted
flag like you want in this function.

>  	}
>  }
>  
> 



More information about the Devel mailing list