[Devel] [PATCH 1/2] sched: calculate_imbalance: Fix local->avg_load > sds->avg_load case

Vladimir Davydov vdavydov at parallels.com
Mon Sep 16 01:06:08 PDT 2013


On 09/16/2013 09:52 AM, Peter Zijlstra wrote:
> On Sun, Sep 15, 2013 at 05:49:13PM +0400, Vladimir Davydov wrote:
>> In busiest->group_imb case we can come to calculate_imbalance() with
>> local->avg_load >= busiest->avg_load >= sds->avg_load. This can result
>> in imbalance overflow, because it is calculated as follows
>>
>> env->imbalance = min(
>> 	max_pull * busiest->group_power,
>> 	(sds->avg_load - local->avg_load) * local->group_power
>> ) / SCHED_POWER_SCALE;
>>
>> As a result we can end up constantly bouncing tasks from one cpu to
>> another if there are pinned tasks.
>>
>> Fix this by skipping the assignment and assuming imbalance=0 in case
>> local->avg_load > sds->avg_load.
>> --
>> The bug can be caught by running 2*N cpuhogs pinned to two logical cpus
>> belonging to different cores on an HT-enabled machine with N logical
>> cpus: just look at se.nr_migrations growth.
>>
>> Signed-off-by: Vladimir Davydov<vdavydov at parallels.com>
>> ---
>>   kernel/sched/fair.c |    3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 9b3fe1c..507a8a9 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4896,7 +4896,8 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
>>   	 * max load less than avg load(as we skip the groups at or below
>>   	 * its cpu_power, while calculating max_load..)
>>   	 */
>> -	if (busiest->avg_load < sds->avg_load) {
>> +	if (busiest->avg_load <= sds->avg_load ||
>> +	    local->avg_load >= sds->avg_load) {
>>   		env->imbalance = 0;
>>   		return fix_small_imbalance(env, sds);
>>   	}
> Why the = part? Surely 'busiest->avg_load < sds->avg_load ||
> local->avg_load > sds->avg_load' avoids both underflows?

Of course it does, but env->imbalance will be assigned to 0 anyway in = 
case, so why not go shortcut?



More information about the Devel mailing list