[CRIU] [RFC PATCH 13/20] criu/plugin: Add initial documentation for ROCm support.

Felix Kuehling Felix.Kuehling at amd.com
Sat May 1 04:58:38 MSK 2021


From: Rajneesh Bhardwaj <rajneesh.bhardwaj at amd.com>

 - placeholder and initial documentation.
 - keep with CRIU main documentation

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj at amd.com>
---
 Documentation/Makefile       |  1 +
 Documentation/kfd_plugin.txt | 45 ++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)
 create mode 100644 Documentation/kfd_plugin.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 5025e2b99..044a31eda 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -13,6 +13,7 @@ endif
 FOOTER		:= footer.txt
 SRC1		+= crit.txt
 SRC1		+= compel.txt
+SRC1		+= kfd_plugin.txt
 SRC8		+= criu.txt
 SRC		:= $(SRC1) $(SRC8)
 XMLS		:= $(patsubst %.txt,%.xml,$(SRC))
diff --git a/Documentation/kfd_plugin.txt b/Documentation/kfd_plugin.txt
new file mode 100644
index 000000000..4caf489c1
--- /dev/null
+++ b/Documentation/kfd_plugin.txt
@@ -0,0 +1,45 @@
+ROCM Support(1)
+===============
+
+NAME
+----
+kfd_plugin - A plugin extention to CRIU to support checkpoint/restore in
+userspace for AMD GPUs.
+
+
+CURRENT SUPPORT
+---------------
+Single GPU systems (Gfx9)
+Checkpoint / Restore on same system
+Checkpoint / Restore inside a docker container
+Pytorch
+
+DESCRIPTION
+-----------
+Though *criu* is a great tool for checkpointing and restoring running
+applications, it has certain limitations such as it cannot handle
+applications that have device files open. In order to support *ROCm* based
+workloads with *criu* we need to augment criu's core functionality with a
+plugin based extention mechanism. *kfd_plugin* provides the necessary support
+to criu to allow Checkpoint / Restore with ROCm.
+
+
+Dependencies
+~~~~~~~~~~~~~~
+*amdkfd support*::
+    In order to snapshot the *VRAM* and other *GPU* device states, we require
+    an updated version of amdkfd(amdgpu) driver. The kernel patches are under
+    review currently.
+
+*criu 3.15*::
+    This work is rebased on latest criu release available at this time.
+
+
+AUTHOR
+------
+The AMDKFD team.
+
+
+COPYRIGHT
+---------
+Copyright \(C) 2020-2021, Advanced Micro Devices, Inc. (AMD)
-- 
2.17.1



More information about the CRIU mailing list