[Devel] [PATCH 10/17] oom: boost dying tasks on global oom

Thu Sep 3 05:11:22 PDT 2015

On 03.09.2015 14:06, Kirill Tkhai wrote:
> 
> 
> On 03.09.2015 13:13, Vladimir Davydov wrote:
>> On Thu, Sep 03, 2015 at 01:09:36PM +0300, Kirill Tkhai wrote:
>>>
>>>
>>> On 14.08.2015 20:03, Vladimir Davydov wrote:
>>>> If an oom victim process has a low prio (nice or via cpu cgroup), it may
>>>> take it very long to complete, which is bad, because the system cannot
>>>> make progress until it dies. To avoid that, this patch makes oom killer
>>>> set victim task prio to the highest possible.
>>>>
>>>> It might be worth submitting this patch upstream. I will probably try.
>>>>
>>>> Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>
>>>> ---
>>>>  mm/oom_kill.c | 17 +++++++++++++++--
>>>>  1 file changed, 15 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>>>> index 0e6f7535a565..ca765a82fa1a 100644
>>>> --- a/mm/oom_kill.c
>>>> +++ b/mm/oom_kill.c
>>>> @@ -294,6 +294,15 @@ enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
>>>>  	return OOM_SCAN_OK;
>>>>  }
>>>>  
>>>> +static void boost_dying_task(struct task_struct *p)
>>>> +{
>>>> +	/*
>>>> +	 * Set the dying task scheduling priority to the highest possible so
>>>> +	 * that it will die quickly irrespective of its scheduling policy.
>>>> +	 */
>>>> +	sched_boost_task(p, 0);
>>>> +}
>>>> +
>>>>  /*
>>>>   * Simple selection loop. We chose the process with the highest
>>>>   * number of 'points'.
>>>> @@ -321,6 +330,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
>>>>  		case OOM_SCAN_CONTINUE:
>>>>  			continue;
>>>>  		case OOM_SCAN_ABORT:
>>>> +			boost_dying_task(p);
>>>
>>> This is potential livelock as you are holding at least try_set_zonelist_oom() bits locked
>>> and concurrent thread may use GFP_NOFAIL in __alloc_pages_slowpath(). This case it will be
>>> looping forever.
>>
>> It won't. There schedule_timeouts all over the place. Besides, if
>> try_set_zonelist_oom fails, the caller will call schedule_timeout.
> 
> Really? What if a victim has signal_pending() flag?
> 
> Even if it's not, you can't base on schedule_timeout(). No guarantees lock holder will be
> choosen for execution as at all.

Ah, schedule_timeout_uninterruptible() is there. So, it's OK. But guarantees are still absense...

>>>
>>> Furthermore, you manually do schedule_timeout_killable() in out_of_memory(), so this problem
>>> is a problem of !PREEMPTIBLE kernel too.
>>
>> I don't get this sentence. What's the problem?
> 
> It's clarification to main problem, that it affects us.
> 
>>>
>>> You mustn't leave processor before you're cleared the bits.
>>
>> Wrong, see above.
>>