[CRIU] Adding pre-migration checks into criu

Pavel Emelyanov xemul at virtuozzo.com
Thu Jul 14 03:46:17 PDT 2016


On 07/12/2016 05:57 PM, Adrian Reber wrote:
> On Sun, Jul 10, 2016 at 06:20:02PM +0300, Pavel Emelyanov wrote:
>> On 07/08/2016 05:46 PM, Adrian Reber wrote:
>>>
>>> Sometimes migration of a process is not possible and it is not really
>>> CRIU faults that it doesn't work. 
>>
>> Yup :)
>>
>>> Simplest reasons are the binary is not
>>> available or different size, libraries are missing, files are missing or
>>
>> The above are "FS mismatch errors". BTW, we don't C/R filesystem in CRIU and
>> this seems to cause certain problems to people. I planned to discuss this on
>> C/R miniconf on Plumbers.
>>
>>> cgroup structure are not available.
>>
>> And there's also CPU mismatch and kernel modules missing (filesystems/networking).
>>
>>> I was thinking of adding something to CRIU which checks all these
>>> obvious reasons for restoration failures.
>>
>> Maybe to p.haul? We have code in p.haul that checks for CPUs being compatible
>> (with the help of criu, of course).
>>
>>> My first idea was to add an option to make a light-dump and a light-restore.
>>> It would be checkpoint without ending the process and without dumping
>>> the memory. This light-checkpoint could then be transferred to the
>>> destination system to make sure all the simple and obvious errors can be
>>> excluded and that the chances that the process will be restored are much
>>> higher.
>>
>> Hm... As a bugs catcher this would also work good.
>>
>>> That the binary and the libraries do exist on the destination system,
>>> that they have the same size and MD5/SHA1/SHAsomething, that all
>>> required files are available and that enough free memory is available
>>> can easily be done in external tools, but as CRIU knows how to collect
>>> these informations already it seems to make sense to include this into
>>> CRIU.
>>
>> Yes, but since FS checks will go over files that were opened/mapped at
>> the time of light-dump and can be not such at real dump time.
>>
>> Also, there are situations that CRIU cannot C/R (data in NL sockets or
>> in-flight TCP handshake) but that tend to disappear by themselves.
>>
>>> Does this sound useful or maybe even completely wrong?
>>
>> If we can somehow distinguish permanent impossibilities (CPU mismatch,
>> kernel modules missing, what else?) from temporary, that would be
>> really helpful.
> 
> I thought about this some more. Especially about the topic at which
> level this should be done. I can see some parts in criu, some in crit
> and some in p.haul. The problem I see with doing different checks at
> different levels (criu, crit, p.haul) is that other frameworks using
> criu will re-implement it and that seems undesirable.
> 
> I also agree with the argument that different files might be
> opened/mapped during light-dump than during the actual dump, but I think
> it would be important to have something to at least give the user a
> chance to have a simple test to check if it works.

Yup, agreed.

> What could I start implementing? There are already checks in CRIU if the
> binary exists and has the same size. That would be something the
> light-dump could do. If all the files it needs to open exist and if they
> have the same size? What else?

We have file mode checks also... Smth around VDSO... CGroups... Modules
for net devices (veth, bridges, etc.)... Hm... It looks like doing
dump --nomem and restore --nomem would be really helpful and we won't 
have to write the needed checks individually, the restore code would be
just re-used.

-- Pavel


More information about the CRIU mailing list