[CRIU] [PATCH 0/5] lazy-pages: fix maps06 failure in Jenkins

Andrei Vagin avagin at virtuozzo.com
Thu Nov 30 01:00:20 MSK 2017


Applied, thanks!

On Wed, Nov 22, 2017 at 09:37:07PM +0200, Mike Rapoport wrote:
> Hi,
> 
> The initial intention was to fix the failure of maps06 in Jenkins, but on
> the way I've done some minor cleanups around epoll and hopefully improved
> robustness of remote page fault handling.
> 
> So, at the end this patches addresses two issues:
> 
> * If page-server on the source fails lazy-pages daemon will wait forever for
> the remote pages because nothing notices that the socket is closed. Simple
> handling of EPOLL{RD}HUP resolves this.
> 
> * If restore takes too much time (e.g. on one of Jenkins workers),
> lazy-pages daemon stops polling userfault fds and starts populating tasks'
> memory before it has been properly remapped and registered with uffd.
> Proposed solution is to prevent background memory fetch before restore is
> finished by waiting for a message from the restore.
> 
> Mike Rapoport (5):
>   util: epoll: move comment about timeout decrease to uffd.c
>   util: epoll: rename revent to read event
>   util: epoll: add processing of EPOLL{RD}HUP
>   page-server: implement epoll->hangup_event
>   lazy-pages: do not allow background fetch before restore is finished
> 
>  criu/cr-restore.c   |  3 ++
>  criu/include/uffd.h |  1 +
>  criu/include/util.h | 16 +++++++++-
>  criu/page-xfer.c    |  9 +++++-
>  criu/uffd.c         | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>  criu/util.c         | 44 ++++++++++++++++++++++-----
>  6 files changed, 147 insertions(+), 12 deletions(-)
> 
> -- 
> 2.7.4
> 


More information about the CRIU mailing list