[CRIU] Adding pre-migration checks into criu

Pavel Emelyanov xemul at virtuozzo.com
Sun Jul 10 08:20:02 PDT 2016


On 07/08/2016 05:46 PM, Adrian Reber wrote:
> 
> Sometimes migration of a process is not possible and it is not really
> CRIU faults that it doesn't work. 

Yup :)

> Simplest reasons are the binary is not
> available or different size, libraries are missing, files are missing or

The above are "FS mismatch errors". BTW, we don't C/R filesystem in CRIU and
this seems to cause certain problems to people. I planned to discuss this on
C/R miniconf on Plumbers.

> cgroup structure are not available.

And there's also CPU mismatch and kernel modules missing (filesystems/networking).

> I was thinking of adding something to CRIU which checks all these
> obvious reasons for restoration failures.

Maybe to p.haul? We have code in p.haul that checks for CPUs being compatible
(with the help of criu, of course).

> My first idea was to add an option to make a light-dump and a light-restore.
> It would be checkpoint without ending the process and without dumping
> the memory. This light-checkpoint could then be transferred to the
> destination system to make sure all the simple and obvious errors can be
> excluded and that the chances that the process will be restored are much
> higher.

Hm... As a bugs catcher this would also work good.

> That the binary and the libraries do exist on the destination system,
> that they have the same size and MD5/SHA1/SHAsomething, that all
> required files are available and that enough free memory is available
> can easily be done in external tools, but as CRIU knows how to collect
> these informations already it seems to make sense to include this into
> CRIU.

Yes, but since FS checks will go over files that were opened/mapped at
the time of light-dump and can be not such at real dump time.

Also, there are situations that CRIU cannot C/R (data in NL sockets or
in-flight TCP handshake) but that tend to disappear by themselves.

> Does this sound useful or maybe even completely wrong?

If we can somehow distinguish permanent impossibilities (CPU mismatch,
kernel modules missing, what else?) from temporary, that would be
really helpful.

-- Pavel



More information about the CRIU mailing list