[CRIU] Adding pre-migration checks into criu
Adrian Reber
adrian at lisas.de
Tue Jul 12 07:57:17 PDT 2016
On Sun, Jul 10, 2016 at 06:20:02PM +0300, Pavel Emelyanov wrote:
> On 07/08/2016 05:46 PM, Adrian Reber wrote:
> >
> > Sometimes migration of a process is not possible and it is not really
> > CRIU faults that it doesn't work.
>
> Yup :)
>
> > Simplest reasons are the binary is not
> > available or different size, libraries are missing, files are missing or
>
> The above are "FS mismatch errors". BTW, we don't C/R filesystem in CRIU and
> this seems to cause certain problems to people. I planned to discuss this on
> C/R miniconf on Plumbers.
>
> > cgroup structure are not available.
>
> And there's also CPU mismatch and kernel modules missing (filesystems/networking).
>
> > I was thinking of adding something to CRIU which checks all these
> > obvious reasons for restoration failures.
>
> Maybe to p.haul? We have code in p.haul that checks for CPUs being compatible
> (with the help of criu, of course).
>
> > My first idea was to add an option to make a light-dump and a light-restore.
> > It would be checkpoint without ending the process and without dumping
> > the memory. This light-checkpoint could then be transferred to the
> > destination system to make sure all the simple and obvious errors can be
> > excluded and that the chances that the process will be restored are much
> > higher.
>
> Hm... As a bugs catcher this would also work good.
>
> > That the binary and the libraries do exist on the destination system,
> > that they have the same size and MD5/SHA1/SHAsomething, that all
> > required files are available and that enough free memory is available
> > can easily be done in external tools, but as CRIU knows how to collect
> > these informations already it seems to make sense to include this into
> > CRIU.
>
> Yes, but since FS checks will go over files that were opened/mapped at
> the time of light-dump and can be not such at real dump time.
>
> Also, there are situations that CRIU cannot C/R (data in NL sockets or
> in-flight TCP handshake) but that tend to disappear by themselves.
>
> > Does this sound useful or maybe even completely wrong?
>
> If we can somehow distinguish permanent impossibilities (CPU mismatch,
> kernel modules missing, what else?) from temporary, that would be
> really helpful.
I thought about this some more. Especially about the topic at which
level this should be done. I can see some parts in criu, some in crit
and some in p.haul. The problem I see with doing different checks at
different levels (criu, crit, p.haul) is that other frameworks using
criu will re-implement it and that seems undesirable.
I also agree with the argument that different files might be
opened/mapped during light-dump than during the actual dump, but I think
it would be important to have something to at least give the user a
chance to have a simple test to check if it works.
What could I start implementing? There are already checks in CRIU if the
binary exists and has the same size. That would be something the
light-dump could do. If all the files it needs to open exist and if they
have the same size? What else?
Adrian
More information about the CRIU
mailing list