[CRIU] [RFC PATCH 00/20] CRIU support for ROCm
Felix Kuehling
Felix.Kuehling at amd.com
Sat May 1 04:58:25 MSK 2021
A whitepaper describing our design can be found here:
https://github.com/RadeonOpenCompute/criu/blob/criu-dev/test/others/ext-kfd/README.md
Most of the patches are the implementation of our device file plugin
code. We are most interested in feedback on the few patches that modify
core CRIU code. I'm pretty sure we don't know what we're doing here, so
your insights will be appreciated:
01/20 - Treat some unsupported VMAs as regular
03/20 - Add offset and file path plugin
07/20 - Introduce restore late stage hook
20/20 - *RFC* Don't cache fd for amdgpu devices
The corresponding kernel patch series will be discussed on
amd-gfx at lists.freedesktop.org and dri-devel at lists.freedesktop.org. The
KFD patches are also avalailable on github:
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/commits/fxkamd/criu-wip
This patch series is also on github:
https://github.com/RadeonOpenCompute/criu/commits/criu-dev
David Yat Sin (7):
criu/plugin: Add support for dumping and restoring queues
criu/plugin: Support larger memory footprints
criu/plugin: Dump and restore events
criu/plugin: Re-adjust doorbell offset for queues
criu/plugin: Implement system topology parsing
criu/plugin: Remap GPUs on checkpoint restore
criu/plugin: Add parameters to override mapping
Rajneesh Bhardwaj (13):
criu/parse: Treat some unsupported VMAs as regular
criu/plugin: Initialize AMD KFD header
criu/files-reg: Add offset and file path plugin
criu/plugin: Support AMD ROCm Checkpoint Restore with KFD
criu/plugin: Optimize the proto image size
criu/plugin: optimization for large bar read
criu/restore: Introduce restore late stage hook
criu/plugin: Implement restore late hook for kfd
criu/plugin: dump debug logs selectively
criu/plugin: Add initial documentation for ROCm support.
criu/plugin: Pytorch container with criu
criu/plugin: Dockerfile for AMD criu repo
criu/files: *RFC* Don't cache fd for amdgpu devices
Documentation/Makefile | 1 +
Documentation/kfd_plugin.txt | 79 ++
criu/cr-restore.c | 15 +
criu/file-ids.c | 11 +-
criu/files-reg.c | 18 +
criu/include/criu-plugin.h | 12 +
criu/include/proc_parse.h | 3 +
criu/plugin.c | 2 +
criu/proc_parse.c | 51 +-
test/others/ext-kfd/Dockerfile | 95 ++
test/others/ext-kfd/Dockerfile.AMD | 114 ++
test/others/ext-kfd/Makefile | 13 +
test/others/ext-kfd/criu-kfd.proto | 107 ++
test/others/ext-kfd/kfd_ioctl.h | 692 ++++++++++
test/others/ext-kfd/kfd_plugin.c | 1917 ++++++++++++++++++++++++++++
15 files changed, 3124 insertions(+), 6 deletions(-)
create mode 100644 Documentation/kfd_plugin.txt
create mode 100644 test/others/ext-kfd/Dockerfile
create mode 100644 test/others/ext-kfd/Dockerfile.AMD
create mode 100644 test/others/ext-kfd/Makefile
create mode 100644 test/others/ext-kfd/criu-kfd.proto
create mode 100644 test/others/ext-kfd/kfd_ioctl.h
create mode 100644 test/others/ext-kfd/kfd_plugin.c
--
2.17.1
More information about the CRIU
mailing list