[Devel] [PATCH RHEL7 COMMIT] mm: vmscan: never wait on writeback pages

Konstantin Khorenko khorenko at virtuozzo.com
Mon Jun 27 04:39:40 PDT 2016


The commit is pushed to "branch-rh7-3.10.0-327.18.2.vz7.14.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-327.18.2.vz7.14.21
------>
commit 3f0bcb48c23415a8a123c3c21ff851694dc81dbd
Author: Vladimir Davydov <vdavydov at virtuozzo.com>
Date:   Mon Jun 27 15:39:40 2016 +0400

    mm: vmscan: never wait on writeback pages
    
    Currently, if memcg reclaim encounters a page under writeback it waits
    for the writeback to finish. This is done in order to avoid hitting OOM
    when there are a lot of potentially reclaimable pages under writeback,
    as memcg lacks dirty pages limit. Although it saves us from premature
    OOM, this technique is deadlock prone if writeback is supposed to be
    done by a process that might need to allocate memory, like in case of
    vstorage. If the process responsible for writeback tries to allocate a
    page it might get stuck in too_many_isolated() loop waiting for
    processes performing memcg reclaim to put isolated pages back to the
    LRU, but memcg reclaim might be stuck waiting for writeback to complete,
    resulting in a deadlock.
    
    To avoid this kind of deadlock, let's, instead of waiting for page
    writeback directly, call congestion_wait() after returning isolated
    pages to the LRU in case writeback pages are recycled through the LRU
    before IO can complete. This should still prevent premature memcg OOM
    while rendering the deadlock described above impossible.
    
    https://jira.sw.ru/browse/PSBM-48115
    
    Signed-off-by: Vladimir Davydov <vdavydov at virtuozzo.com>
---
 mm/vmscan.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3f6ce18..3ac08dd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -929,11 +929,11 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		 *    __GFP_IO|__GFP_FS for this reason); but more thought
 		 *    would probably show more reasons.
 		 *
-		 * 3) memcg encounters a page that is not already marked
+		 * 3) memcg encounters a page that is already marked
 		 *    PageReclaim. memcg does not have any dirty pages
 		 *    throttling so we could easily OOM just because too many
 		 *    pages are in writeback and there is nothing else to
-		 *    reclaim. Wait for the writeback to complete.
+		 *    reclaim. Stall memcg reclaim then.
 		 */
 		if (PageWriteback(page)) {
 			/* Case 1 above */
@@ -954,7 +954,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				 * enough to care.  What we do want is for this
 				 * page to have PageReclaim set next time memcg
 				 * reclaim reaches the tests above, so it will
-				 * then wait_on_page_writeback() to avoid OOM;
+				 * then stall to avoid OOM;
 				 * and it's also appropriate in global reclaim.
 				 */
 				SetPageReclaim(page);
@@ -964,7 +964,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 			/* Case 3 above */
 			} else {
-				wait_on_page_writeback(page);
+				nr_immediate++;
+				goto keep_locked;
 			}
 		}
 
@@ -1586,10 +1587,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	if (nr_writeback && nr_writeback == nr_taken)
 		zone_set_flag(zone, ZONE_WRITEBACK);
 
-	/*
-	 * memcg will stall in page writeback so only consider forcibly
-	 * stalling for global reclaim
-	 */
+	if (!global_reclaim(sc) && nr_immediate)
+		congestion_wait(BLK_RW_ASYNC, HZ/10);
+
 	if (global_reclaim(sc)) {
 		/*
 		 * Tag a zone as congested if all the dirty pages scanned were


More information about the Devel mailing list