[CRIU] [RFC PATCH 13/20] criu/plugin: Add initial documentation for ROCm support.
Felix Kuehling
Felix.Kuehling at amd.com
Sat May 1 04:58:38 MSK 2021
From: Rajneesh Bhardwaj <rajneesh.bhardwaj at amd.com>
- placeholder and initial documentation.
- keep with CRIU main documentation
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj at amd.com>
---
Documentation/Makefile | 1 +
Documentation/kfd_plugin.txt | 45 ++++++++++++++++++++++++++++++++++++
2 files changed, 46 insertions(+)
create mode 100644 Documentation/kfd_plugin.txt
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 5025e2b99..044a31eda 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -13,6 +13,7 @@ endif
FOOTER := footer.txt
SRC1 += crit.txt
SRC1 += compel.txt
+SRC1 += kfd_plugin.txt
SRC8 += criu.txt
SRC := $(SRC1) $(SRC8)
XMLS := $(patsubst %.txt,%.xml,$(SRC))
diff --git a/Documentation/kfd_plugin.txt b/Documentation/kfd_plugin.txt
new file mode 100644
index 000000000..4caf489c1
--- /dev/null
+++ b/Documentation/kfd_plugin.txt
@@ -0,0 +1,45 @@
+ROCM Support(1)
+===============
+
+NAME
+----
+kfd_plugin - A plugin extention to CRIU to support checkpoint/restore in
+userspace for AMD GPUs.
+
+
+CURRENT SUPPORT
+---------------
+Single GPU systems (Gfx9)
+Checkpoint / Restore on same system
+Checkpoint / Restore inside a docker container
+Pytorch
+
+DESCRIPTION
+-----------
+Though *criu* is a great tool for checkpointing and restoring running
+applications, it has certain limitations such as it cannot handle
+applications that have device files open. In order to support *ROCm* based
+workloads with *criu* we need to augment criu's core functionality with a
+plugin based extention mechanism. *kfd_plugin* provides the necessary support
+to criu to allow Checkpoint / Restore with ROCm.
+
+
+Dependencies
+~~~~~~~~~~~~~~
+*amdkfd support*::
+ In order to snapshot the *VRAM* and other *GPU* device states, we require
+ an updated version of amdkfd(amdgpu) driver. The kernel patches are under
+ review currently.
+
+*criu 3.15*::
+ This work is rebased on latest criu release available at this time.
+
+
+AUTHOR
+------
+The AMDKFD team.
+
+
+COPYRIGHT
+---------
+Copyright \(C) 2020-2021, Advanced Micro Devices, Inc. (AMD)
--
2.17.1
More information about the CRIU
mailing list