[CRIU] [PATCH v1] shmem: use mincore page status bit with anon shared mem dumping

Eugene Batalov eabatalov89 at gmail.com
Sun Aug 28 08:56:53 PDT 2016


Hi Mike,

Ok. Thank you for the correction. I'll send v2 where I'll use "anon shmem"
soon.

2016-08-28 17:46 GMT+03:00 Mike Rapoport <rppt at linux.vnet.ibm.com>:

> Hi Eugene,
>
> On Sun, Aug 28, 2016 at 01:23:23AM +0300, Eugene Batalov wrote:
> > It turned out that ashmem can have pages with non zero content
>
> Just a small nitpick about the "ashmem" term.
> There is an "ashmem" driver used by Android, which also implements
> anonymous shared memory, but in slightly different way.
> AFAIK, CRIU does not support the Android ashmem, so, IMHO, it'll be better
> to use "anon shmem" to avoid mis-undestandings...
>
> > and with both PME_PRESENT and PME_SWAP bits unset in all its vmas
> > in the whole ps tree.
> > Such case is reproduced in issue #209:
> > 1. Dump ps tree with ashmem filled using datagen.
> > 2. Restore ps tree. ashmem content is restored
> >    in open_shmem(). fd is created for it and it is
> >    unmapped from restorer process.
> > 3. ashmem vma is mapped in restore_mapping() of pie restorer context.
> >    ashmem content is already initialized to non zero content
> >    but restored process doesn't touch its newly mapped vma.
> > 4. Run CRIU dump again. All the pages of ashmem vmas have
> >    PME_PRESENT and PME_SWAP bits unset and we don't put
> >    vma pages to dump.
> >
> > So if we filter ashmem pages using PME_PRESENT and PME_SWAP bits
> > the same way as we do it for anon private mem then we have a bug.
> > PME_PRESENT and PME_SWAP bits work for anon private mem because
> > at least one process would restore content of private anon vma
> > in its own address space thus PME bits will be set and pages
> > will be damped.
> >
> > We can't just stop using PME_PRESENT and PME_SWAP bits and dump all
> > non soft dirty and non zero pfn pages. In this case each 1Gb of
> > mapped and not used ashmem vma will go to dump. This is too bad.
> >
> > To fix the bug in this patch we use mincore bits to finally
> > understand should we dump page or not. mincore bits show page
> > usage status better because mincore performs deeper checking of
> > internal in-kernel state. PME bits filling is based only on
> > process page table.
> >
> > Using mincore has a drawback. It doesn't work when page is in swap.
> > But it's ok for now because mincore was used before we started using
> > PME bits. Also mincore doesn't break page changes tracking
> > functionality for ashmem that we have now.
> >
> > This bug can be fixed in another way. For example we can make ashmem
> > restoration work similar to anon private mem restoration.
> > But this fix looks much harder to implement.
> >
> > Signed-off-by: Eugene Batalov <eabatalov89 at gmail.com>
> > ---
> >  criu/shmem.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/criu/shmem.c b/criu/shmem.c
> > index a21e8c1..3a2e4d1 100644
> > --- a/criu/shmem.c
> > +++ b/criu/shmem.c
> > @@ -607,6 +607,7 @@ static int dump_one_shmem(struct shmem_info *si)
> >       struct page_pipe *pp;
> >       struct page_xfer xfer;
> >       int err, ret = -1, fd;
> > +     unsigned char *mc_map = NULL;
> >       void *addr = NULL;
> >       unsigned long pfn, nrpages;
> >
> > @@ -625,6 +626,14 @@ static int dump_one_shmem(struct shmem_info *si)
> >       }
> >
> >       nrpages = (si->size + PAGE_SIZE - 1) / PAGE_SIZE;
> > +     mc_map = xmalloc(nrpages * sizeof(*mc_map));
> > +     if (!mc_map)
> > +             goto err_unmap;
> > +     /* We can't rely only on PME bits for ashmem */
> > +     err = mincore(addr, si->size, mc_map);
> > +     if (err)
> > +             goto err_unmap;
> > +
> >       pp = create_page_pipe((nrpages + 1) / 2, NULL, PP_CHUNK_MODE);
> >       if (!pp)
> >               goto err_unmap;
> > @@ -638,7 +647,7 @@ static int dump_one_shmem(struct shmem_info *si)
> >               unsigned long pgaddr;
> >
> >               pgstate = get_pstate(si->pstate_map, pfn);
> > -             if (pgstate == PST_DONT_DUMP)
> > +             if ((pgstate == PST_DONT_DUMP) && !(mc_map[pfn] &
> PAGE_RSS))
> >                       continue;
> >
> >               pgaddr = (unsigned long)addr + pfn * PAGE_SIZE;
> > @@ -669,6 +678,7 @@ err_pp:
> >  err_unmap:
> >       munmap(addr,  si->size);
> >  err:
> > +     xfree(mc_map);
> >       return ret;
> >  }
> >
> > --
> > 1.9.1
> >
> > _______________________________________________
> > CRIU mailing list
> > CRIU at openvz.org
> > https://lists.openvz.org/mailman/listinfo/criu
> >
>
>


-- 
Best regards,
Eugene Batalov.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20160828/8b15b1eb/attachment.html>


More information about the CRIU mailing list