[CRIU] [RFC PATCH] Try to include userfaultfd with criu
Adrian Reber
adrian at lisas.de
Mon Nov 9 05:31:54 PST 2015
On Thu, Sep 24, 2015 at 06:44:16PM +0300, Pavel Emelyanov wrote:
> On 09/24/2015 04:04 PM, adrian at lisas.de wrote:
> > From: Adrian Reber <areber at redhat.com>
> >
> > This is a first try to include userfaultfd with criu. Right now it
> > still requires a "normal" checkpoint. After checkpointing the application
> > it can be restored with the help of userfaultfd.
>
> Thanks for looking into this :) Please, see my comments inline.
Thanks for your comments. Took me some time to look at uffd code again.
> > The normal restore still copies all pages from the checkpoint to
> > the right location (this still needs to be disabled) and the usage
> > of userfaultfd is right now hardcoded in this patch.
> >
> > All restored pages with MAP_ANONYMOUS set are marked as being handled by
> > userfaultfd and also madvise()'d as MADV_DONTNEED.
> >
> > If I also enable userfaultfd for pages without MAP_ANONYMOUS the restored process
> > segfaults. I have not looked into more details into why it segfaults.
>
> Hm... IIRC the uffd from Andrea only worked on anonymous private areas, so
> when you turned one one (and didn't check the ret code from register ioctl ;)
> the respective vma just wasn't restored at all.
Yes, I think I also understood it now.
> > As soon as the process is restored it blocks on the first memory access
> > and waits for pages being transferred by userfaultfd.
> >
> > To handle the required pages a new criu command has been added. The restore
> > works now like this:
> >
> > criu restore -D /tmp/3 -j -v4
> >
> > This hangs after the restored process is running and needs:
> >
> > criu uffd -v4 -D /tmp/3/
>
> So the plan for uffd action is to start a daemon that would read
> pages from images and put them into some uffd descriptor, right?
Yes, I have the normal criu restore process running now like this:
$ ./criu restore -D /tmp/3 -j -v4 --lazy-page
and it hangs on the uffd file descriptor until criu uffd daemon is
running:
$ ./criu uffd -v4 -D /tmp/3/
> > This waits on a UFFD FD which has been passed by the 'criu restore' process over
> > unix domain sockets for UFFD requests. For my current test programm following
> > pages are transmitted over UFFD:
> >
>
> [snip]
>
> > TODO:
> > * Provide parameter for restore to not always use UFFD like described in the wiki:
> > http://criu.org/Userfaultfd : --lazy-pages
> >
> > * Do not restore pages which will be transferred via UFFD.
> > They still need to be mapped.
> >
> > * Unix domain sockets with credentials.
>
> * Pick up the non-cooperative uffd patches, as real apps may fork() and mremap()
> memory and this would screw things up.
>
> Sigh... I will try to find time to rebase the patches onto recent tree and resend
> them, but if you will get blocked by this, let me know :)
Not necessary yet...
> > @@ -2650,6 +2654,131 @@ out:
> > extern void __gcov_flush(void) __attribute__((weak));
> > void __gcov_flush(void) {}
> >
> [snip]
> > +out:
> > + close(client);
> > + free(name);
> > + return (ret);
> > +}
> > +
> > +static int send_uffd(int fd, int send_fd)
>
> We already have this code in criu :) The pie/util-fd.c
> provides send_fds and recv_fds and in include/util-pie.h
> there are helpers to send/recv a single FD using those.
I removed my own code handling the FD transfer and I am now using the
criu FD code.
> > +{
> > + struct iovec iov[1];
> > + struct msghdr msg;
> > + char buf[2];
> > + struct cmsghdr *cmptr = NULL;
> > +
>
> > @@ -2931,6 +3064,31 @@ static int sigreturn_restore(pid_t pid, CoreEntry *core)
> >
> > strncpy(task_args->comm, core->tc->comm, sizeof(task_args->comm));
> >
> > + /*
> > + * open userfaulfd FD which is to the restorer blob and
> > + * to a second process handling the userfaultfd page faults
> > + */
> > + uffd = syscall(__NR_userfaultfd, O_CLOEXEC);
> > + pr_info("uffd %d\n", uffd);
> > +
> > + if ((listen = server_listen()) < 0) {
> > + pr_info("server_listen error");
> > + }
> > +
> > + /* accept new client request */
> > + if ((client = server_accept(listen)) < 0) {
> > + pr_info("server_accept error: %d", client);
> > + }
> > +
> > + send_uffd(client, uffd);
> > +
> > + close(listen);
> > + close(client);
> > +
> > + install_service_fd(UFFD_FD_OFF, uffd);
>
> It looks like the service descriptor is not required for uffd. Service
> fds are those used all over the code and that needs to be put into some
> "known" location in the code. The uffd one is only passed into restorer.
Okay, I am no longer installing UFFD as a service descriptor and it
still works.
> > + close(uffd);
> > +
> > + task_args->uffd = get_service_fd(UFFD_FD_OFF);
> >
> > /*
> > * Fill up per-thread data.
>
> > @@ -606,6 +639,8 @@ static int vma_remap(unsigned long src, unsigned long dst, unsigned long len)
> > pr_err("Unable to remap %lx -> %lx\n", src, dst);
> > return -1;
> > }
> > + if (flags & 0x20)
>
> Can you put named constants here instead of 0x20?
Done.
> > + enable_uffd(uffd, dst, len);
> >
> > return 0;
> > }
>
> > +
> > +static int ud_open()
> > +{
> > + int udfd;
> > + int newfd;
> > +
> > + if ((udfd = client_conn()) < 0) {
> > + pr_err("unix domain socket connection error");
> > + return (-1);
> > + }
> > +
> > + newfd = recv_fd(udfd);
> > + close(udfd);
> > +
> > + return newfd;
> > +}
> > +
> > +
> > +static unsigned long find_page(unsigned long addr)
>
> There's a seek_pagemap_page() call for this type of operation. One problem with
> it is that it only works sequentially, i.e. when subsequent calls to it pass
> constantly increasing addresses.
>
> I'd suggest fixing this seek_pagemap_page() to overcome this limitation and use
> one here. This would also be much faster than readdir() and open_page_read() for
> every single page.
I think I almost have this figured out. I am now doing:
pr->get_pagemap(pr, &iov);
seek_pagemap_page(pr, addr, true);
pr->read_page(pr, addr, buf);
As this only works for the first page I introduced a rewind_pagemap()
function with:
lseek(img_raw_fd(pr->pi), 0, SEEK_SET);
Unfortunately the next call to get_pagemap() now returns 0 instead of
the usual 1. So somehow I am missing something which is needed to rewind
the pagemap reading. Any idea how to do this better/correctly?
I can also post my current uffd patch if that makes discussion easier...
> > +{
> > + struct dirent *ent;
> > + DIR *dirp = fdopendir(dup(criu_get_image_dir()));
> > + struct page_read pr;
> > + struct iovec iov;
> > + int pid;
> > + int ret = -7;
> > + unsigned long offset = 0;
> > +
> > + while (1) {
> > + ent = readdir(dirp);
> > + if (ent == NULL) {
> > + if (errno) {
> > + pr_perror("Failed readdir, error=%d", errno);
> > + offset = -1;
> > + goto exit;
> > + }
>
> > +int uffd_listen()
> > +{
> > + struct uffd_msg msg;
> > + unsigned long offset = 0;
> > + int rc;
> > + int uffd;
> > + int uffd_flags;
> > + struct uffdio_api uffdio_api;
> > + struct uffdio_copy uffdio_copy;
> > + unsigned long ps;
> > + int pages;
> > + void *mapped_pages;
> > + __u64 flags;
> > + __u64 address;
> > +
> > + if ((uffd = ud_open()) < 0)
> > + exit(0);
> > +
> > + pr_info("uffd %d\n", uffd);
> > + uffd_flags = fcntl(uffd, F_GETFD, NULL);
> > + pr_info("uffd_flags 0x%x\n", uffd_flags);
> > + pr_info("UFFD_API 0x%llx\n", UFFD_API);
> > +
> > + uffdio_api.api = UFFD_API;
> > + uffdio_api.features = 0;
> > +
> > + if (ioctl(uffd, UFFDIO_API, &uffdio_api)) {
>
> Why do you call UFFDIO_API for the 2nd time here? It should have been
> called by restorer already.
Need to look into this.
> > + pr_err("UFFDIO_API ioctl() failed\n");
> > + return 1;
> > + }
> > +
> > + if (uffdio_api.api != UFFD_API) {
> > + pr_err("UFFDIO_API error %Lu\n", uffdio_api.api);
> > + return 1;
> > + }
> > +
> > + ps = page_size();
> > +
> > + while (1) {
Adrian
More information about the CRIU
mailing list