[CRIU] Adding pre-migration checks into criu

Adrian Reber adrian at lisas.de
Tue Jul 12 07:57:17 PDT 2016


On Sun, Jul 10, 2016 at 06:20:02PM +0300, Pavel Emelyanov wrote:
> On 07/08/2016 05:46 PM, Adrian Reber wrote:
> > 
> > Sometimes migration of a process is not possible and it is not really
> > CRIU faults that it doesn't work. 
> 
> Yup :)
> 
> > Simplest reasons are the binary is not
> > available or different size, libraries are missing, files are missing or
> 
> The above are "FS mismatch errors". BTW, we don't C/R filesystem in CRIU and
> this seems to cause certain problems to people. I planned to discuss this on
> C/R miniconf on Plumbers.
> 
> > cgroup structure are not available.
> 
> And there's also CPU mismatch and kernel modules missing (filesystems/networking).
> 
> > I was thinking of adding something to CRIU which checks all these
> > obvious reasons for restoration failures.
> 
> Maybe to p.haul? We have code in p.haul that checks for CPUs being compatible
> (with the help of criu, of course).
> 
> > My first idea was to add an option to make a light-dump and a light-restore.
> > It would be checkpoint without ending the process and without dumping
> > the memory. This light-checkpoint could then be transferred to the
> > destination system to make sure all the simple and obvious errors can be
> > excluded and that the chances that the process will be restored are much
> > higher.
> 
> Hm... As a bugs catcher this would also work good.
> 
> > That the binary and the libraries do exist on the destination system,
> > that they have the same size and MD5/SHA1/SHAsomething, that all
> > required files are available and that enough free memory is available
> > can easily be done in external tools, but as CRIU knows how to collect
> > these informations already it seems to make sense to include this into
> > CRIU.
> 
> Yes, but since FS checks will go over files that were opened/mapped at
> the time of light-dump and can be not such at real dump time.
> 
> Also, there are situations that CRIU cannot C/R (data in NL sockets or
> in-flight TCP handshake) but that tend to disappear by themselves.
> 
> > Does this sound useful or maybe even completely wrong?
> 
> If we can somehow distinguish permanent impossibilities (CPU mismatch,
> kernel modules missing, what else?) from temporary, that would be
> really helpful.

I thought about this some more. Especially about the topic at which
level this should be done. I can see some parts in criu, some in crit
and some in p.haul. The problem I see with doing different checks at
different levels (criu, crit, p.haul) is that other frameworks using
criu will re-implement it and that seems undesirable.

I also agree with the argument that different files might be
opened/mapped during light-dump than during the actual dump, but I think
it would be important to have something to at least give the user a
chance to have a simple test to check if it works.

What could I start implementing? There are already checks in CRIU if the
binary exists and has the same size. That would be something the
light-dump could do. If all the files it needs to open exist and if they
have the same size? What else?

		Adrian


More information about the CRIU mailing list