<div dir="auto"><div>Hi, Felix<div dir="auto"><br></div><div dir="auto">Please, see my comments inline.</div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">вт, 16 июн. 2020 г., 4:04 Felix Kuehling &lt;<a href="mailto:felix.kuehling@gmail.com">felix.kuehling@gmail.com</a>&gt;:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div>

    <div lang="x-unicode">

      <p>Hi all,</p>

      <p>I&#39;m investigating the possibility of making CRIU work with

        ROCm, the AMD Radeon Open Compute Platform. I need some advice,

        but I&#39;ll give you some background first.</p>

      <p>ROCm uses the /dev/kfd device as well as /dev/dri/renderD*. I&#39;m

        planning to do most of the state saving using /dev/kfd with a

        cr_plugin_dump_file callback in a plugin. I&#39;ve spent some time

        reading documentation on <a href="http://criu.org" target="_blank" rel="noreferrer">criu.org</a> and also CRIU source code. At

        this point I believe I have a fairly good understanding of the

        low level details of saving kernel mode state associated with

        ROCm processes.</p>

      <p>I have more trouble with restoring the state. The main issue is

        the way KFD maps system memory for device access using HMM (or

        get_user pages and MMU notifiers with DKMS on older kernels).

        This requires the VMAs to be at the expected virtual addresses

        before we try to mirror them into the GPU page table.</p></div></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto">What if the system memory of this area is shared between several processes? And mapped in all of them in different virtual address. Presumably the requirement is just to have them mapped at correct virtual address of the caller process?</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div lang="x-unicode"><p> Resuming

        execution on the GPU also needs to be delayed until after the

        GPU memory mappings have been restored.<br>

      </p>

      <p>At the time of the cr_plugin_restore_file callback, the VMAs

        are not at the right place in the restored process, so this is

        too early to restore the GPU memory mappings.</p></div></div></blockquote></div></div><div dir="auto">True. At this point some vmas are in so called premapped area and some don&#39;t yet exist.</div><div dir="auto"></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div lang="x-unicode"><p></p></div></div></blockquote></div></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div lang="x-unicode"><p>I can send the

        mappings and their properties to KFD but KFD needs to wait for

        some later trigger event before it activates the mappings and

        their MMU notifiers.</p>

      <p>So this is my question: What would be a good trigger event to

        indicate that VMAs have been moved to their proper location by

        the restorer parasite code?</p></div></div></blockquote></div></div><div dir="auto">We have &quot;restore stages&quot; that are used to synchronize all the processes at specific points. The last 3 refer to places where all the memory is in needed virtual address. If you wait for a stage to complete and don&#39;t start the next one, then you can safely execute whatever is needed with all tasks&#39; memory being mapped at final virtual address.</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div lang="x-unicode"><p> I have considered two possibilities

        that will not work. I&#39;m hoping you can give me some better

        ideas:</p>

      <ul>

        <li>cr_plugin_fini</li>

        <ul>

          <li>Doesn&#39;t get called in all the child processes, not sure if

            there is synchronization with the child processes&#39; restore

            completion<br></li></ul></ul></div></div></blockquote></div></div><div dir="auto">Yes, it&#39;s called in the master process after all stages completed. It&#39;s a cleanup hook.</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div lang="x-unicode"><ul><ul><li>

          </li>

        </ul>

        <li>An MMU notifier on the munmap of the restorer parasite blob

          itself</li>

        <ul>

          <li>In cr_plugin_restore_file this address is not known yet</li></ul></ul></div></div></blockquote></div></div><div dir="auto">Can the restoring code run in criu main process instead of one of the child ones? If yes, this could make things simpler. You can add yet another plugin invocations near apply_memfd_seal, this is the place where all child processes are stopped with their vmas properly restored. But again, this place runs in the context of criu master process , not children.</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div lang="x-unicode"><ul><ul>

        </ul>

      </ul>

      <p>I noticed that the child processes are resumed through

        sigreturn. I&#39;m not familiar with this mechanism. Does this mean

        there is some signal I may be able to intercept just before

        execution of the child process resumes?</p></div></div></blockquote></div></div><div dir="auto">No, sigreturn is just the mechanics we use to restore tasks registers :)</div><div dir="auto"><br></div><div dir="auto">-- Pavel</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div lang="x-unicode">

      <p>Thank you in advance for your insights.</p>

      <p>Best regards,<br>

          Felix</p>

      <p><br>

      </p>

    </div>

  </div>

_______________________________________________<br>

CRIU mailing list<br>

<a href="mailto:CRIU@openvz.org" target="_blank" rel="noreferrer">CRIU@openvz.org</a><br>

<a href="https://lists.openvz.org/mailman/listinfo/criu" rel="noreferrer noreferrer" target="_blank">https://lists.openvz.org/mailman/listinfo/criu</a><br>

</blockquote></div></div></div>