[CRIU] Checkpointing Pods

Adrian Reber adrian at lisas.de
Tue Apr 28 15:38:21 MSK 2020


I started to integrate checkpoint/restore in CRI-O (https://cri-o.io/)
and its containers are always running in pods.

As my first step I wanted to checkpoint a single container in a pod and
as the code base has similarities to Podman it looks not too
complicated.

Trying to checkpoint a container fails however with:

(01.042480) Error (criu/namespaces.c:1081): Can't dump a pid namespace without the process init

If I understood pods correctly there is always at least one container
running in a pod with a 'pause' process (k8s.gcr.io/pause:3.2 in my
tests). I think this exists because without a container running in a pod,
the pod would not exist.

I have no idea which namespaces are shared between containers in a pod
and which are not shared, but to solve the problem from above I think
the same mechanism used for network namespaces is also needed for all
namespaces.

Checkpointing and restoring a container with Podman, CRIU is always told
to ignore the network namespace because Podman (CNI in fact) handles the
network namespace:

 criu dump --external net[<inode>]:netns-name -t <PID>
 criu restore --inherit-fd fd[<FD>]:netns-name

I would like to extend this to all namespace as the config.json for
my container tells me that almost all namespaces are shared:

    "namespaces": [
      {
        "type": "NEWPID",
        "path": "/proc/1038/ns/pid"
      },
      {
        "type": "NEWNET",
        "path": "/proc/1038/ns/net"
      },
      {
        "type": "NEWIPC",
        "path": "/proc/1038/ns/ipc"
      },
      {
        "type": "NEWUTS",
        "path": "/proc/1038/ns/uts"
      },
      {
        "type": "NEWNS",
        "path": ""
      }
    ],

PID 1038 is the pause process.

I am mentioning it here just in case I misunderstood something. Any
comments if this is right or wrong?

		Adrian


More information about the CRIU mailing list