[Devel] [PATCH 07/17] Port diff-sched-introduce-cond_resched_may_throttle
Kirill Tkhai
ktkhai at odin.com
Thu Sep 3 01:56:57 PDT 2015
On 14.08.2015 20:03, Vladimir Davydov wrote:
> This patch reintroduces cond_resched_may_throttle(), which acts exactly
> as cond_resched() except it does not forbid CFS bandwidth induced
> throttling. It will be used in the following patch.
>
> =============================================================================
> Author: Vladimir Davydov
> Email: vdavydov at parallels.com
> Subject: sched: introduce cond_resched_may_throttle
> Date: Fri, 15 Mar 2013 18:53:12 +0400
>
> Since cond_resched() is sometimes called under a semaphore, it was
> forbidden to throttle tasks there in order to eliminate the possibility
> of the priority inversion problem. However, it turned out that some
> tasks must be throttled on cond_resched(), otherwise they won't have a
> chance to be throttled at all breaking the concept of CPU limits. The
> most notable (and currently the only identified) example is vm
> hypervisors such as KVM or Balalaika.
>
> To fix this problem, the patch introduces the new function
> cond_resched_may_throttle(), which works just like usual cond_resched()
> except it allows the scheduler to throttle the caller's task group. This
> function must be used by those pieces of software that can only be
> throttled on cond_resched() under certain conditions. This function is
> to and will be used by Balalaika - I'm going to send the corresponding
> patch. Perhaps, it's also worth while using it in KVM, however there is
> no rush in it because I doubt anyone will use KVM, vzkernel, and our
> hacked CPU limits altogether so it can wait.
>
> https://jira.sw.ru/browse/PSBM-18888
>
> Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>
> =============================================================================
>
> Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>
> ---
> include/linux/sched.h | 8 ++++++++
> kernel/sched/core.c | 11 +++++++++++
> kernel/sched/fair.c | 3 ++-
> 3 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 65fedc487c92..3bd2f50822b8 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1245,6 +1245,7 @@ struct task_struct {
> unsigned sched_interruptible_sleep:1;
>
> unsigned woken_while_running:1;
> + unsigned may_throttle:1;
>
> pid_t pid;
> pid_t tgid;
> @@ -2693,6 +2694,13 @@ extern int _cond_resched(void);
> _cond_resched(); \
> })
>
> +extern int __cond_resched_may_throttle(void);
> +
> +#define cond_resched_may_throttle() ({ \
> + __might_sleep(__FILE__, __LINE__, 0); \
> + __cond_resched_may_throttle(); \
> +})
> +
> extern int __cond_resched_lock(spinlock_t *lock);
>
> #ifdef CONFIG_PREEMPT_COUNT
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index bbb6fc3251ad..77d330cd2b79 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4790,6 +4790,17 @@ int __sched _cond_resched(void)
> }
> EXPORT_SYMBOL(_cond_resched);
>
> +int __sched __cond_resched_may_throttle(void)
> +{
> + if (should_resched()) {
> + current->may_throttle = 1;
> + __cond_resched();
> + current->may_throttle = 0;
> + return 1;
> + }
> + return 0;
> +}
> +
> /*
> * __cond_resched_lock() - if a reschedule is pending, drop the given lock,
> * call schedule, and on return reacquire the lock.
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 167d0f66387b..49a93569b5e6 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -965,7 +965,8 @@ static inline void update_entity_boost(struct sched_entity *se)
> p->woken_while_running;
> p->woken_while_running = 0;
> } else
> - se->boosted = sched_feat(BOOST_PREEMPT);
> + se->boosted = sched_feat(BOOST_PREEMPT) &&
> + !p->may_throttle;
This function is unsafe, because there is case where put_prev_task() is made
before dequeue_task() in scheduler. This lead to we clear boosted flag and leave
se linked in boosted list, while it mustn't be there.
I suggest to unconditionally dequeue bootsted entities, where you do that,
without checking for entity_boosted(). Of course, you need to use list_del_init()
instead of plain list_del(). In this case you will be able to change se->boosted
flag like you want in this function.
> }
> }
>
>
More information about the Devel
mailing list