[CRIU] Behavioral guarantees of a Restore involving a plugin

Ramesh Errabolu ramesh.errabolu at gmail.com
Mon Sep 18 11:05:34 MSK 2023


I am trying to understand the "*restore*" call sequence for a process tree
that involves a plugin.

*Scenario*:

   - The system has two GPUs
   - Parent process forks a child
   - The checkpointing and restore procedures are handled by *amdgpu* plugin
   - The checkpointing of a process results in three files
   - For the scenario this will result in six files
      - 3 files for the parent labeled as PF1, PF2 and PF3
      - 3 files for the child labeled as CF1, CF2 and CF3

*Questions*:

   - What kind of behavioral guarantees can I expect from the CRIU framework
   - In my experimentation:
      - CRIU framework tries to restore the two processes concurrently
      - For any process being restored
         - CRIU calls plugin api with the file handles in the order in
         which they were generated
      - For the parent process this will be PF1 -> PF2 -> PF3
      - For the child process this will be CF1 -> CF2 -> CF3
   - Is my statement about process restoration order (PF1 -> PF2 -> PF3)
   correct ?
      - If so another sequence such as (PF2 -> PF1 -> PF3) would be
      considered invalid
   - Is there a particular order in which process restoration begins?
   - In my experimentation restoration of child process begins first
      - Is this behavior designed into the restore algorithm
   - Since restoration of processes seem to be concurrent
      - is it correct to assume that state from one process will NOT BE
      available to another process of the same tree
         - In this example state from parent process would NOT become
         available for child process or vice-versa
      - Per my experience users cannot assume that state from one process
      would be available to another process
      - The order in which processes are restored could change
      - The restoration of a processes state could be fragmented across
      multiple files
         - In this example, the state is checkpointed into 3 files
      - A global ordering of restore is considered valid if it ensures the
      files of all process are restored in order without regard to files of
      other processes
      - Following restore orders are legal
      - *PF1, PF2, PF3, CF1, CF2, CF3*
         - *CF1, PF1, CF2, PF2, CF3, PF3*
         - *CF1, CF2, PF1, CF3, PF2, PF3*

I hope my explanation is clear. Looking forward to responses that would
help me correct or improve my understanding.

Regards,
Ramesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20230918/e2581835/attachment-0001.html>


More information about the CRIU mailing list