[CRIU] [PATCH 0/5] lazy-pages: fix maps06 failure in Jenkins
Andrei Vagin
avagin at virtuozzo.com
Thu Nov 30 01:00:20 MSK 2017
Applied, thanks!
On Wed, Nov 22, 2017 at 09:37:07PM +0200, Mike Rapoport wrote:
> Hi,
>
> The initial intention was to fix the failure of maps06 in Jenkins, but on
> the way I've done some minor cleanups around epoll and hopefully improved
> robustness of remote page fault handling.
>
> So, at the end this patches addresses two issues:
>
> * If page-server on the source fails lazy-pages daemon will wait forever for
> the remote pages because nothing notices that the socket is closed. Simple
> handling of EPOLL{RD}HUP resolves this.
>
> * If restore takes too much time (e.g. on one of Jenkins workers),
> lazy-pages daemon stops polling userfault fds and starts populating tasks'
> memory before it has been properly remapped and registered with uffd.
> Proposed solution is to prevent background memory fetch before restore is
> finished by waiting for a message from the restore.
>
> Mike Rapoport (5):
> util: epoll: move comment about timeout decrease to uffd.c
> util: epoll: rename revent to read event
> util: epoll: add processing of EPOLL{RD}HUP
> page-server: implement epoll->hangup_event
> lazy-pages: do not allow background fetch before restore is finished
>
> criu/cr-restore.c | 3 ++
> criu/include/uffd.h | 1 +
> criu/include/util.h | 16 +++++++++-
> criu/page-xfer.c | 9 +++++-
> criu/uffd.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++--
> criu/util.c | 44 ++++++++++++++++++++++-----
> 6 files changed, 147 insertions(+), 12 deletions(-)
>
> --
> 2.7.4
>
More information about the CRIU
mailing list