[Devel] [PATCH RHEL7 COMMIT] ve/oom: do not impose grouping rules if process score is adjusted

Mon Jun 8 09:10:29 PDT 2015

The commit is pushed to "branch-rh7-3.10.0-123.1.2-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-123.1.2.vz7.5.11
------>
commit e4893adf40349a53bec70291f037740d0ff8985b
Author: Vladimir Davydov <vdavydov at parallels.com>
Date:   Mon Jun 8 20:10:29 2015 +0400

    ve/oom: do not impose grouping rules if process score is adjusted
    
    Series description:
    
    This patch /proc/PID/{oom_score_adj,oom_adj,oom_score} behavior inside a
    CT, resurrecting /proc/vz/oom_score_adj along the way. For more details,
    see individual patches.
    
    https://jira.sw.ru/browse/PSBM-33849
    ====================================================================
    This patch description:
    
    This is a sort of port of diff-oom-fixup-automatic-oom-score-adjustment.
    A "sort of", because the way it operates isn't quite the same. In RH6
    there is a special value for oom_score_adj (OOM_SCORE_ADJ_UNSET=1001),
    which means "score has never been adjusted and currently equals to 0". I
    find such a design ugly, because there are a lot of places where we read
    the score value in order to e.g. dump it to the log. Seeing 1001, which
    is greater than the maximal possible value of 1000, would bewilder the
    user.
    
    So I change the logic. Now we impose oom grouping rules iff
    oom_score_adj doesn't equal 0. Therefore the difference from RH6 is that
    by setting the value back to 0, the user will re-enable grouping rules.
    This looks quite natural to me, because if the user actually wants to
    change OOM killer behavior, he will set oom_score_adj to something
    non-zero obviously.
    
    Since systemd allows tweaking oom_score_adj per unit, the whole oom
    grouping thing is going to be obsoleted, and minor change in its
    behavior is not critical.
    
    Signed-off-by: Vladimir Davydov <vdavydov at parallels.com>
    Acked-by: Andrew Vagin <avagin at odin.com>
---
 fs/proc/base.c           | 6 ------
 include/bc/beancounter.h | 1 -
 kernel/bc/proc.c         | 3 ---
 mm/oom_group.c           | 9 ++++-----
 4 files changed, 4 insertions(+), 15 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 08d4a62..360169a 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1121,12 +1121,6 @@ static ssize_t oom_score_adj_write(struct file *file, const char __user *buf,
 		task->signal->oom_score_adj_min = (short)oom_score_adj;
 	trace_oom_score_adj_update(task);
 
-	/*
-	 * Container uses modern interface, seems like it know what to do.
-	 * So, we can disable automaic oom-score adjustments.
-	 */
-	set_bit(UB_OOM_MANUAL_SCORE_ADJ, &get_exec_ub()->ub_flags);
-
 err_sighand:
 	unlock_task_sighand(task, &flags);
 err_task_lock:
diff --git a/include/bc/beancounter.h b/include/bc/beancounter.h
index ea05b11..3f887cc 100644
--- a/include/bc/beancounter.h
+++ b/include/bc/beancounter.h
@@ -149,7 +149,6 @@ struct user_beancounter {
 
 enum ub_flags {
 	UB_DIRTY_EXCEEDED,
-	UB_OOM_MANUAL_SCORE_ADJ,
 };
 
 extern int ub_count;
diff --git a/kernel/bc/proc.c b/kernel/bc/proc.c
index d333a1a..e0ddbf1 100644
--- a/kernel/bc/proc.c
+++ b/kernel/bc/proc.c
@@ -103,9 +103,6 @@ static int bc_debug_show(struct seq_file *f, void *v)
 	seq_printf(f, "bc: %p\n", ub);
 	seq_printf(f, "sizeof: %lu\n", sizeof(struct user_beancounter));
 
-	seq_printf(f, "oom_score_adj: %s\n", (ub->ub_flags &
-				UB_OOM_MANUAL_SCORE_ADJ) ? "manual" : "auto");
-
 	return 0;
 }
 
diff --git a/mm/oom_group.c b/mm/oom_group.c
index 2401eed..f2d54e5 100644
--- a/mm/oom_group.c
+++ b/mm/oom_group.c
@@ -58,12 +58,11 @@ int get_task_oom_score_adj(struct task_struct *t)
 	unsigned long flags;
 	const struct cred *cred;
 	uid_t task_uid;
-	int adj = 0;
+	int adj = t->signal->oom_score_adj;
 
-#ifdef CONFIG_BEANCOUNTERS
-	if (test_bit(UB_OOM_MANUAL_SCORE_ADJ, &get_task_ub(t)->ub_flags))
-		return t->signal->oom_score_adj;
-#endif
+	/* Do not impose grouping rules if the score is adjusted by the user */
+	if (adj != 0)
+		return adj;
 
 	rcu_read_lock();
 	cred = __task_cred(t);