[CRIU] Maintain long running HPC jobs across sysupgrades using CRIU?
Stefan Kombrink
stefan.kombrink at uni-ulm.de
Mon Apr 10 01:50:21 PDT 2017
Hi Pavel,
thanks for replying :)
This is the log tail:
tail ./restore-2017-04-04T13:22:43+02:00/restore.log
(00.019892) cg: rewriting
docker/3cd49e65ade7f1221b63e078c104700fd36a165662363c54f23e8b17f0d7fc36
to /docker/3cdc03490529b739a14096d3c012f325411ef3149f9a72662920fd99ef152305
(00.019907) cg: rewriting
docker/3cd49e65ade7f1221b63e078c104700fd36a165662363c54f23e8b17f0d7fc36
to /docker/3cdc03490529b739a14096d3c012f325411ef3149f9a72662920fd99ef152305
(00.019913) cg: rewriting
docker/3cd49e65ade7f1221b63e078c104700fd36a165662363c54f23e8b17f0d7fc36
to /docker/3cdc03490529b739a14096d3c012f325411ef3149f9a72662920fd99ef152305
(00.019919) cg: rewriting
docker/3cd49e65ade7f1221b63e078c104700fd36a165662363c54f23e8b17f0d7fc36
to /docker/3cdc03490529b739a14096d3c012f325411ef3149f9a72662920fd99ef152305
(00.019925) cg: rewriting
docker/3cd49e65ade7f1221b63e078c104700fd36a165662363c54f23e8b17f0d7fc36
to /docker/3cdc03490529b739a14096d3c012f325411ef3149f9a72662920fd99ef152305
(00.019931) cg: rewriting
docker/3cd49e65ade7f1221b63e078c104700fd36a165662363c54f23e8b17f0d7fc36
to /docker/3cdc03490529b739a14096d3c012f325411ef3149f9a72662920fd99ef152305
(00.019940) cg: Preparing cgroups yard (cgroups restore mode 0x4)
(00.020818) cg: Opening .criu.cgyard.Q1QaGh as cg yard
(00.020947) cg: Making controller dir .criu.cgyard.Q1QaGh/net_cls (net_cls)
(00.021063) Error (cgroup.c:1562): cg: Can't mount controller dir
.criu.cgyard.Q1QaGh/net_cls: Device or resource busy
I doubt it is docker related in this case as a colleague of mine
experienced similar troubles using Criu without docker (sorry, no logs).
The kernel versions are:
CentOS7.3
Name : kernel
Arch : x86_64
Version : 3.10.0
Release : 514.10.2.el7
CentOS7.2
Name : kernel
Arch : x86_64
Version : 3.10.0
Release : 327.el7
>> Is updating criu save?
>
> Upgrading CRIU is safe in terms of -- newer criu mush read older images
> and understand them. But as I said, restoring a container is much more
> than just using criu. Docker may change. Kernel can change in incompatible
> manner too, but that's rare and should be explicitly turned on.
>
>> Is maintaining downwards compatibility for checkpoint restores a goal/on
>> the roadmap for criu and/or criu integration into docker?
>
> Downgrading any component is not guaranteed to work in 100% cases :) We
> sometimes change criu so that older versions stop understanding newer
> images. But not the vice-versa.
Then our key question is:
Does the CRIO project attempt to make C/R work across system/kernel/criu
updates (i.e. if the updated kernel version changes the way cgroups behave)?
I guess that might be hard to accomplish...
>> Is there, after all, a lesser chance of breakage using docker instead of
>> non-containerized criu apps?
>
> Having Docker checkpoint/restore broken is more likely, than having pure
> criu broken :) For us C/R is the main feature, for Docker C/R is experimental,
> so they don't monitor it well enough (yet).
thanks & greetings
Stefan
--
Stefan Kombrink
Universität Ulm
kiz / Abteilung Infrastruktur
+49-731-50-22439
More information about the CRIU
mailing list