<div dir="ltr">Hi Mike,<div><br></div><div>Ok. Thank you for the correction. I'll send v2 where I'll use <span style="font-size:12.8px">"anon shmem" soon.</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">2016-08-28 17:46 GMT+03:00 Mike Rapoport <span dir="ltr"><<a href="mailto:rppt@linux.vnet.ibm.com" target="_blank">rppt@linux.vnet.ibm.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Eugene,<br>
<span class=""><br>
On Sun, Aug 28, 2016 at 01:23:23AM +0300, Eugene Batalov wrote:<br>
> It turned out that ashmem can have pages with non zero content<br>
<br>
</span>Just a small nitpick about the "ashmem" term.<br>
There is an "ashmem" driver used by Android, which also implements<br>
anonymous shared memory, but in slightly different way.<br>
AFAIK, CRIU does not support the Android ashmem, so, IMHO, it'll be better<br>
to use "anon shmem" to avoid mis-undestandings...<br>
<div><div class="h5"><br>
> and with both PME_PRESENT and PME_SWAP bits unset in all its vmas<br>
> in the whole ps tree.<br>
> Such case is reproduced in issue #209:<br>
> 1. Dump ps tree with ashmem filled using datagen.<br>
> 2. Restore ps tree. ashmem content is restored<br>
> in open_shmem(). fd is created for it and it is<br>
> unmapped from restorer process.<br>
> 3. ashmem vma is mapped in restore_mapping() of pie restorer context.<br>
> ashmem content is already initialized to non zero content<br>
> but restored process doesn't touch its newly mapped vma.<br>
> 4. Run CRIU dump again. All the pages of ashmem vmas have<br>
> PME_PRESENT and PME_SWAP bits unset and we don't put<br>
> vma pages to dump.<br>
><br>
> So if we filter ashmem pages using PME_PRESENT and PME_SWAP bits<br>
> the same way as we do it for anon private mem then we have a bug.<br>
> PME_PRESENT and PME_SWAP bits work for anon private mem because<br>
> at least one process would restore content of private anon vma<br>
> in its own address space thus PME bits will be set and pages<br>
> will be damped.<br>
><br>
> We can't just stop using PME_PRESENT and PME_SWAP bits and dump all<br>
> non soft dirty and non zero pfn pages. In this case each 1Gb of<br>
> mapped and not used ashmem vma will go to dump. This is too bad.<br>
><br>
> To fix the bug in this patch we use mincore bits to finally<br>
> understand should we dump page or not. mincore bits show page<br>
> usage status better because mincore performs deeper checking of<br>
> internal in-kernel state. PME bits filling is based only on<br>
> process page table.<br>
><br>
> Using mincore has a drawback. It doesn't work when page is in swap.<br>
> But it's ok for now because mincore was used before we started using<br>
> PME bits. Also mincore doesn't break page changes tracking<br>
> functionality for ashmem that we have now.<br>
><br>
> This bug can be fixed in another way. For example we can make ashmem<br>
> restoration work similar to anon private mem restoration.<br>
> But this fix looks much harder to implement.<br>
><br>
> Signed-off-by: Eugene Batalov <<a href="mailto:eabatalov89@gmail.com">eabatalov89@gmail.com</a>><br>
> ---<br>
> criu/shmem.c | 12 +++++++++++-<br>
> 1 file changed, 11 insertions(+), 1 deletion(-)<br>
><br>
> diff --git a/criu/shmem.c b/criu/shmem.c<br>
> index a21e8c1..3a2e4d1 100644<br>
> --- a/criu/shmem.c<br>
> +++ b/criu/shmem.c<br>
> @@ -607,6 +607,7 @@ static int dump_one_shmem(struct shmem_info *si)<br>
> struct page_pipe *pp;<br>
> struct page_xfer xfer;<br>
> int err, ret = -1, fd;<br>
> + unsigned char *mc_map = NULL;<br>
> void *addr = NULL;<br>
> unsigned long pfn, nrpages;<br>
><br>
> @@ -625,6 +626,14 @@ static int dump_one_shmem(struct shmem_info *si)<br>
> }<br>
><br>
> nrpages = (si->size + PAGE_SIZE - 1) / PAGE_SIZE;<br>
> + mc_map = xmalloc(nrpages * sizeof(*mc_map));<br>
> + if (!mc_map)<br>
> + goto err_unmap;<br>
> + /* We can't rely only on PME bits for ashmem */<br>
> + err = mincore(addr, si->size, mc_map);<br>
> + if (err)<br>
> + goto err_unmap;<br>
> +<br>
> pp = create_page_pipe((nrpages + 1) / 2, NULL, PP_CHUNK_MODE);<br>
> if (!pp)<br>
> goto err_unmap;<br>
> @@ -638,7 +647,7 @@ static int dump_one_shmem(struct shmem_info *si)<br>
> unsigned long pgaddr;<br>
><br>
> pgstate = get_pstate(si->pstate_map, pfn);<br>
> - if (pgstate == PST_DONT_DUMP)<br>
> + if ((pgstate == PST_DONT_DUMP) && !(mc_map[pfn] & PAGE_RSS))<br>
> continue;<br>
><br>
> pgaddr = (unsigned long)addr + pfn * PAGE_SIZE;<br>
> @@ -669,6 +678,7 @@ err_pp:<br>
> err_unmap:<br>
> munmap(addr, si->size);<br>
> err:<br>
> + xfree(mc_map);<br>
> return ret;<br>
> }<br>
><br>
> --<br>
> 1.9.1<br>
><br>
</div></div>> ______________________________<wbr>_________________<br>
> CRIU mailing list<br>
> <a href="mailto:CRIU@openvz.org">CRIU@openvz.org</a><br>
> <a href="https://lists.openvz.org/mailman/listinfo/criu" rel="noreferrer" target="_blank">https://lists.openvz.org/<wbr>mailman/listinfo/criu</a><br>
><br>
<br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">Best regards,<br>Eugene Batalov.</div>
</div>