[CRIU] [PATCH 2/2] Make lacking cgroup properties non-fatal

Pavel Emelyanov xemul at parallels.com
Wed Aug 13 22:04:32 PDT 2014


On 08/14/2014 01:46 AM, Garrison Bellack wrote:
> On Wed, Aug 13, 2014 at 1:55 PM, Tycho Andersen <tycho.andersen at canonical.com <mailto:tycho.andersen at canonical.com>> wrote:
> 
>     On Wed, Aug 13, 2014 at 01:32:01PM -0700, Garrison Bellack wrote:
>     > On Wed, Aug 13, 2014 at 12:43 PM, Andrew Vagin <avagin at parallels.com <mailto:avagin at parallels.com>> wrote:
>     >
>     > > On Wed, Aug 13, 2014 at 11:59:33AM -0700, gbellack at google.com <mailto:gbellack at google.com> wrote:
>     > > > From: Garrison Bellack <gbellack at google.com <mailto:gbellack at google.com>>
>     > > >
>     > > > Because different kernel versions have different cgroup properties, criu
>     > > > shouldn't crash just because the properties statically listed aren't
>     > > exact.
>     > > > Instead, during dump, ignore properties the kernel doesn't have and
>     > > continue.
>     > > > In addition, during restore, in the event of migration to a kernel that
>     > > is
>     > > > missing a property that was dumped, print an error but continue.
>     > >
>     > > I am not sure that we can continue in this case. Why do you think that
>     > > it's safe? I think we need to ask a user about this. So can we add a new
>     > > option to allow continuing in this case?
>     >
>     >
>     > I'm a little confused about your concerns of safety. Are you referring to
>     > missing properties on the dump or restore side?
> 
>     I think restore side. I also wondered the same thing -- it seems that
>     missing properties on restore should be an error (perhaps as Andrew
>     says with an --allow-missing-cg-props flag on the restore side to
>     allow you to override this error).
> 
> 
> Here is the reasoning behind this decision. Let say you are migrating properties A,B,C,D where B doesn't exist.
> 
> This is how it currently works:
> A will be restored. B prints an error and causes failure because it is missing. However, because cgroup property restoration is the last thing to happen during restore, and because of designs further up the call chain, the restoration of the process still goes through. C,D are not restored because of previous criu failure.
> End result -- Process and property A restored, C and D are not restored even though they exist on the new machine
> 
> With this patch:
> A will be restored. B prints an error and continues. C and D are restored.
> End result --  Process and property A, C, D all restored.
> 
> I think between these two options we clearly prefer the later behavior.

Absolutely. AFAIU Andrew and Tycho meant, that there could be the 3rd behavior -- once B 
fails we abort all the restore. And they proposed an option for controlling this.

I have a question about this -- what does LXC tool do if it meets a cgroup configuration
parameter in container's config, that is missing in the kernel? And what does google's 
containers engine (presumably this is lmctfy) do in that case?

If we draw an analogy with e.g. net namespace configuration -- if some feature is missing
(sysctl or net device option/flag/whatever) namespace restore would be aborted and so
will be the restore procedure.

Thank,
Pavel



More information about the CRIU mailing list