[CRIU] [PATCH v1] shmem: use mincore page status bit with anon shared mem dumping
Eugene Batalov
eabatalov89 at gmail.com
Sat Aug 27 15:23:23 PDT 2016
It turned out that ashmem can have pages with non zero content
and with both PME_PRESENT and PME_SWAP bits unset in all its vmas
in the whole ps tree.
Such case is reproduced in issue #209:
1. Dump ps tree with ashmem filled using datagen.
2. Restore ps tree. ashmem content is restored
in open_shmem(). fd is created for it and it is
unmapped from restorer process.
3. ashmem vma is mapped in restore_mapping() of pie restorer context.
ashmem content is already initialized to non zero content
but restored process doesn't touch its newly mapped vma.
4. Run CRIU dump again. All the pages of ashmem vmas have
PME_PRESENT and PME_SWAP bits unset and we don't put
vma pages to dump.
So if we filter ashmem pages using PME_PRESENT and PME_SWAP bits
the same way as we do it for anon private mem then we have a bug.
PME_PRESENT and PME_SWAP bits work for anon private mem because
at least one process would restore content of private anon vma
in its own address space thus PME bits will be set and pages
will be damped.
We can't just stop using PME_PRESENT and PME_SWAP bits and dump all
non soft dirty and non zero pfn pages. In this case each 1Gb of
mapped and not used ashmem vma will go to dump. This is too bad.
To fix the bug in this patch we use mincore bits to finally
understand should we dump page or not. mincore bits show page
usage status better because mincore performs deeper checking of
internal in-kernel state. PME bits filling is based only on
process page table.
Using mincore has a drawback. It doesn't work when page is in swap.
But it's ok for now because mincore was used before we started using
PME bits. Also mincore doesn't break page changes tracking
functionality for ashmem that we have now.
This bug can be fixed in another way. For example we can make ashmem
restoration work similar to anon private mem restoration.
But this fix looks much harder to implement.
Signed-off-by: Eugene Batalov <eabatalov89 at gmail.com>
---
criu/shmem.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/criu/shmem.c b/criu/shmem.c
index a21e8c1..3a2e4d1 100644
--- a/criu/shmem.c
+++ b/criu/shmem.c
@@ -607,6 +607,7 @@ static int dump_one_shmem(struct shmem_info *si)
struct page_pipe *pp;
struct page_xfer xfer;
int err, ret = -1, fd;
+ unsigned char *mc_map = NULL;
void *addr = NULL;
unsigned long pfn, nrpages;
@@ -625,6 +626,14 @@ static int dump_one_shmem(struct shmem_info *si)
}
nrpages = (si->size + PAGE_SIZE - 1) / PAGE_SIZE;
+ mc_map = xmalloc(nrpages * sizeof(*mc_map));
+ if (!mc_map)
+ goto err_unmap;
+ /* We can't rely only on PME bits for ashmem */
+ err = mincore(addr, si->size, mc_map);
+ if (err)
+ goto err_unmap;
+
pp = create_page_pipe((nrpages + 1) / 2, NULL, PP_CHUNK_MODE);
if (!pp)
goto err_unmap;
@@ -638,7 +647,7 @@ static int dump_one_shmem(struct shmem_info *si)
unsigned long pgaddr;
pgstate = get_pstate(si->pstate_map, pfn);
- if (pgstate == PST_DONT_DUMP)
+ if ((pgstate == PST_DONT_DUMP) && !(mc_map[pfn] & PAGE_RSS))
continue;
pgaddr = (unsigned long)addr + pfn * PAGE_SIZE;
@@ -669,6 +678,7 @@ err_pp:
err_unmap:
munmap(addr, si->size);
err:
+ xfree(mc_map);
return ret;
}
--
1.9.1
More information about the CRIU
mailing list