[Devel] [PATCH rh7 1/2] mm: vmscan: never wait on writeback pages

Mon Jun 27 03:34:05 PDT 2016

Currently, if memcg reclaim encounters a page under writeback it waits
for the writeback to finish. This is done in order to avoid hitting OOM
when there are a lot of potentially reclaimable pages under writeback,
as memcg lacks dirty pages limit. Although it saves us from premature
OOM, this technique is deadlock prone if writeback is supposed to be
done by a process that might need to allocate memory, like in case of
vstorage. If the process responsible for writeback tries to allocate a
page it might get stuck in too_many_isolated() loop waiting for
processes performing memcg reclaim to put isolated pages back to the
LRU, but memcg reclaim might be stuck waiting for writeback to complete,
resulting in a deadlock.

To avoid this kind of deadlock, let's, instead of waiting for page
writeback directly, call congestion_wait() after returning isolated
pages to the LRU in case writeback pages are recycled through the LRU
before IO can complete. This should still prevent premature memcg OOM
while rendering the deadlock described above impossible.

https://jira.sw.ru/browse/PSBM-48115

Signed-off-by: Vladimir Davydov <vdavydov at virtuozzo.com>
---
 mm/vmscan.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3f6ce18df3ed..3ac08ddf50b8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -929,11 +929,11 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		 *    __GFP_IO|__GFP_FS for this reason); but more thought
 		 *    would probably show more reasons.
 		 *
-		 * 3) memcg encounters a page that is not already marked
+		 * 3) memcg encounters a page that is already marked
 		 *    PageReclaim. memcg does not have any dirty pages
 		 *    throttling so we could easily OOM just because too many
 		 *    pages are in writeback and there is nothing else to
-		 *    reclaim. Wait for the writeback to complete.
+		 *    reclaim. Stall memcg reclaim then.
 		 */
 		if (PageWriteback(page)) {
 			/* Case 1 above */
@@ -954,7 +954,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				 * enough to care.  What we do want is for this
 				 * page to have PageReclaim set next time memcg
 				 * reclaim reaches the tests above, so it will
-				 * then wait_on_page_writeback() to avoid OOM;
+				 * then stall to avoid OOM;
 				 * and it's also appropriate in global reclaim.
 				 */
 				SetPageReclaim(page);
@@ -964,7 +964,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 			/* Case 3 above */
 			} else {
-				wait_on_page_writeback(page);
+				nr_immediate++;
+				goto keep_locked;
 			}
 		}
 
@@ -1586,10 +1587,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	if (nr_writeback && nr_writeback == nr_taken)
 		zone_set_flag(zone, ZONE_WRITEBACK);
 
-	/*
-	 * memcg will stall in page writeback so only consider forcibly
-	 * stalling for global reclaim
-	 */
+	if (!global_reclaim(sc) && nr_immediate)
+		congestion_wait(BLK_RW_ASYNC, HZ/10);
+
 	if (global_reclaim(sc)) {
 		/*
 		 * Tag a zone as congested if all the dirty pages scanned were
-- 
2.1.4