[CRIU] [PATCH 0/5] lazy-pages: fix maps06 failure in Jenkins

Mike Rapoport rppt at linux.vnet.ibm.com
Wed Nov 22 22:37:07 MSK 2017


Hi,

The initial intention was to fix the failure of maps06 in Jenkins, but on
the way I've done some minor cleanups around epoll and hopefully improved
robustness of remote page fault handling.

So, at the end this patches addresses two issues:

* If page-server on the source fails lazy-pages daemon will wait forever for
the remote pages because nothing notices that the socket is closed. Simple
handling of EPOLL{RD}HUP resolves this.

* If restore takes too much time (e.g. on one of Jenkins workers),
lazy-pages daemon stops polling userfault fds and starts populating tasks'
memory before it has been properly remapped and registered with uffd.
Proposed solution is to prevent background memory fetch before restore is
finished by waiting for a message from the restore.

Mike Rapoport (5):
  util: epoll: move comment about timeout decrease to uffd.c
  util: epoll: rename revent to read event
  util: epoll: add processing of EPOLL{RD}HUP
  page-server: implement epoll->hangup_event
  lazy-pages: do not allow background fetch before restore is finished

 criu/cr-restore.c   |  3 ++
 criu/include/uffd.h |  1 +
 criu/include/util.h | 16 +++++++++-
 criu/page-xfer.c    |  9 +++++-
 criu/uffd.c         | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 criu/util.c         | 44 ++++++++++++++++++++++-----
 6 files changed, 147 insertions(+), 12 deletions(-)

-- 
2.7.4



More information about the CRIU mailing list