[CRIU] Maintain long running HPC jobs across sysupgrades using CRIU?
Stefan Kombrink
stefan.kombrink at uni-ulm.de
Wed Apr 5 03:24:41 PDT 2017
Hi folks,
I work in a HPC environment where we want to checkpoint long running
processes every couple of days.
So far checkpointing and restore seems to work fine with latest
docker-ce and criu 2.3.
But we also occasionally do system updates and while restore was okay
after a kernel update I wasn't able to restore after upgrading from
CentOS7.2 to CentOS7.3 (error about cgroup mount failed)
So the question is?
Which upgrades/updates might break restore functionality?
Is updating criu save?
Is maintaining downwards compatibility for checkpoint restores a goal/on
the roadmap for criu and/or criu integration into docker?
Is there, after all, a lesser chance of breakage using docker instead of
non-containerized criu apps?
thanks & greets
Stefan
--
Stefan Kombrink
Universität Ulm
kiz / Abteilung Infrastruktur
+49-731-50-22439
More information about the CRIU
mailing list