[CRIU] Checkpointing Pods
Adrian Reber
adrian at lisas.de
Tue Apr 28 15:38:21 MSK 2020
I started to integrate checkpoint/restore in CRI-O (https://cri-o.io/)
and its containers are always running in pods.
As my first step I wanted to checkpoint a single container in a pod and
as the code base has similarities to Podman it looks not too
complicated.
Trying to checkpoint a container fails however with:
(01.042480) Error (criu/namespaces.c:1081): Can't dump a pid namespace without the process init
If I understood pods correctly there is always at least one container
running in a pod with a 'pause' process (k8s.gcr.io/pause:3.2 in my
tests). I think this exists because without a container running in a pod,
the pod would not exist.
I have no idea which namespaces are shared between containers in a pod
and which are not shared, but to solve the problem from above I think
the same mechanism used for network namespaces is also needed for all
namespaces.
Checkpointing and restoring a container with Podman, CRIU is always told
to ignore the network namespace because Podman (CNI in fact) handles the
network namespace:
criu dump --external net[<inode>]:netns-name -t <PID>
criu restore --inherit-fd fd[<FD>]:netns-name
I would like to extend this to all namespace as the config.json for
my container tells me that almost all namespaces are shared:
"namespaces": [
{
"type": "NEWPID",
"path": "/proc/1038/ns/pid"
},
{
"type": "NEWNET",
"path": "/proc/1038/ns/net"
},
{
"type": "NEWIPC",
"path": "/proc/1038/ns/ipc"
},
{
"type": "NEWUTS",
"path": "/proc/1038/ns/uts"
},
{
"type": "NEWNS",
"path": ""
}
],
PID 1038 is the pause process.
I am mentioning it here just in case I misunderstood something. Any
comments if this is right or wrong?
Adrian
More information about the CRIU
mailing list