[CRIU] Preserve the "dumpable" flag on criu dump/restore.

Filipe Brandenburger filbranden at google.com
Mon May 12 19:46:08 PDT 2014


Ok, after zeroing the pages that were COWed but were not returned by
pr.get_pagemap (which are effectively zero pages and it means that
range should probably not have been COWed in the first place) I got
all the tests to pass *except* for the file backed test.

I'm looking at the file test and I'm wondering how it is actually
supposed to pass in the first place... If both parent and child are
mmapping the same file, then parent updates a page on the file, how is
the child expected to know what to checksum it against? I think the
test is bogus there...

I was thinking of a solution of putting the file_tcs struct on a shm
page, maybe that would work... Still has race conditions in it though.

I'm far from producing a patch that could fix COW behavior, but I hope
these pointers in my investigation might help you guys produce one.
Let me know if I could help you further in this investigation.

Cheers,
Filipe


On Mon, May 12, 2014 at 6:55 PM, Filipe Brandenburger
<filbranden at google.com> wrote:
> Hi,
>
> On Mon, May 12, 2014 at 5:01 PM, Filipe Brandenburger
> <filbranden at google.com> wrote:
>> Setting paddr = decode_pointer(p->premmaped_addr); instead seems to help...
>>
>> After doing that I get many pages COWed:
>>
>> (00.013729)  17521: nr_restored_pages: 182
>> (00.013731)  17521: nr_compared_pages: 370
>> (00.013733)  17521: nr_shared_pages:   188
>> (00.013735)  17521: nr_droped_pages:   35
>>
>> However, that introduces many fails like the one below:
>>
>> 16:59:15.995: 17521: FAIL: cow01.c:151: 1: 0x2b35fff8f000 data mismatch
>>
>> Will keep looking.
>
> So after using paddr = decode_pointer(p->premapped_addr); the (first)
> problem seems to be in case where the parent and child share a vma in
> the same address space but it's not really COWed. With that patch in
> place, it will try to COW it at first, which should be fine. The
> problem then is that restore_priv_vma_content() will jump the zero
> pages and those pages will not be reset from the parent...
>
> It seems the fix for that is not that simple and requires some deeper
> changes to that function to detect whether zero pages were added in
> the COWed case, in which case it should zero them out (or mmap them
> again?)
>
> That doesn't seem to be the only problem though, I seem to also be
> having problem on the other runs when bit with value 8 is set
> (a_f_write_parent?), all other bits unset, except for bit with value 2
> (b_f_read?) which can be both set and unset... Not sure if I'm reading
> the bits right though.
>
> Cheers,
> Filipe


More information about the CRIU mailing list