[Devel] [PATCH 10/17] oom: boost dying tasks on global oom

Thu Sep 3 04:06:08 PDT 2015

On 03.09.2015 13:13, Vladimir Davydov wrote:
> On Thu, Sep 03, 2015 at 01:09:36PM +0300, Kirill Tkhai wrote:
>>
>>
>> On 14.08.2015 20:03, Vladimir Davydov wrote:
>>> If an oom victim process has a low prio (nice or via cpu cgroup), it may
>>> take it very long to complete, which is bad, because the system cannot
>>> make progress until it dies. To avoid that, this patch makes oom killer
>>> set victim task prio to the highest possible.
>>>
>>> It might be worth submitting this patch upstream. I will probably try.
>>>
>>> Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>
>>> ---
>>>  mm/oom_kill.c | 17 +++++++++++++++--
>>>  1 file changed, 15 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>>> index 0e6f7535a565..ca765a82fa1a 100644
>>> --- a/mm/oom_kill.c
>>> +++ b/mm/oom_kill.c
>>> @@ -294,6 +294,15 @@ enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
>>>  	return OOM_SCAN_OK;
>>>  }
>>>  
>>> +static void boost_dying_task(struct task_struct *p)
>>> +{
>>> +	/*
>>> +	 * Set the dying task scheduling priority to the highest possible so
>>> +	 * that it will die quickly irrespective of its scheduling policy.
>>> +	 */
>>> +	sched_boost_task(p, 0);
>>> +}
>>> +
>>>  /*
>>>   * Simple selection loop. We chose the process with the highest
>>>   * number of 'points'.
>>> @@ -321,6 +330,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
>>>  		case OOM_SCAN_CONTINUE:
>>>  			continue;
>>>  		case OOM_SCAN_ABORT:
>>> +			boost_dying_task(p);
>>
>> This is potential livelock as you are holding at least try_set_zonelist_oom() bits locked
>> and concurrent thread may use GFP_NOFAIL in __alloc_pages_slowpath(). This case it will be
>> looping forever.
> 
> It won't. There schedule_timeouts all over the place. Besides, if
> try_set_zonelist_oom fails, the caller will call schedule_timeout.

Really? What if a victim has signal_pending() flag?

Even if it's not, you can't base on schedule_timeout(). No guarantees lock holder will be
choosen for execution as at all.

>>
>> Furthermore, you manually do schedule_timeout_killable() in out_of_memory(), so this problem
>> is a problem of !PREEMPTIBLE kernel too.
> 
> I don't get this sentence. What's the problem?

It's clarification to main problem, that it affects us.

>>
>> You mustn't leave processor before you're cleared the bits.
> 
> Wrong, see above.
>