<div dir="ltr">The container is only running the one process, but I have pools of identical containers, and checkpoint/restore into ones unpredictably -- so the underlying things like mount points and file descriptors would change, which is what I&#39;m using docker to manage.</div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, May 12, 2015 at 2:46 PM, Ruslan Kuprieiev <span dir="ltr">&lt;<a href="mailto:kupruser@gmail.com" target="_blank">kupruser@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    Oh, so the whole container is being dumped and not only that one
    process?<br>
    Hm, you might be able to just call criu_dump on whole container<br>
    from within that process just as I showed you in code below(but
    specify container<br>
    pid) and get same results. The way that that return 1 in criu_dump
    works is criu<br>
    puts a proper response packet into that service socket when
    restoring a process tree,<br>
    so everything should work.<div><div class="h5"><br>
    <br>
    <div>On 05/13/2015 12:36 AM, Ross Boucher
      wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr">That&#39;s an interesting idea. Though, my process is
        inside of a docker container, and I think it would get upset by
        being restored into a different container. I think I need the
        coordination docker is doing in order for my system to work.</div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Tue, May 12, 2015 at 2:27 PM, Ruslan
          Kuprieiev <span dir="ltr">&lt;<a href="mailto:kupruser@gmail.com" target="_blank">kupruser@gmail.com</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000"> I&#39;m saying that you
              might want to consider calling criu_dump() from a process
              that you are<br>
              trying to dump. We call it self dump[1]. For example,
              using criu_dump() from libcriu it might look like:<br>
              <br>
              ...<br>
              while (1) {<br>
                  ret = criu_dump();<br>
                  if (ret &lt; 0) {<br>
                      /*error*/<br>
                  } else if (ret == 0) {<br>
                     /*dump is ok*/<br>
                  } else if (ret == 1) {<br>
                    /*This process is restored*/<br>
                    /*reestablish connection or do whatever needs to be
              done<br>
                     * in case of broken connection */<br>
                  }<br>
                  /*accept connection and evaluate code*/<br>
              }<br>
              ...<br>
              <br>
              [1] <a href="http://criu.org/Self_dump" target="_blank">http://criu.org/Self_dump</a>
              <div>
                <div><br>
                  <br>
                  <br>
                  <div>On 05/12/2015 11:25 PM, Ross Boucher wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr">I&#39;m not sure I follow. You&#39;re saying,
                      the process that actually calls restore would get
                      notified? Or, are you saying that somehow in the
                      restored process I can access something set by
                      criu?
                      <div><br>
                      </div>
                      <div>Assuming the former, I don&#39;t think that&#39;s
                        necessary -- I already know that I&#39;ve just
                        restored the process. I could try to send a
                        signal from the coordinating process and then
                        use that signal to cancel the read thread, which
                        would be mostly the same thing. But because that
                        would have to travel through quite a few layers,
                        it seems like it would be better and more
                        performant to do it from within the restored
                        process itself.</div>
                      <div><br>
                      </div>
                      <div>Perhaps I am just misunderstanding your
                        suggestion though.</div>
                      <div><br>
                      </div>
                    </div>
                    <div class="gmail_extra"><br>
                      <div class="gmail_quote">On Tue, May 12, 2015 at
                        12:37 PM, Ruslan Kuprieiev <span dir="ltr">&lt;<a href="mailto:kupruser@gmail.com" target="_blank">kupruser@gmail.com</a>&gt;</span>
                        wrote:<br>
                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                          <div bgcolor="#FFFFFF" text="#000000"> Hi,
                            Ross<br>
                            <br>
                            When restoring using RPC or Libcriu response
                            message contains &quot;restored&quot; field set to
                            true,<br>
                            that help process to detect if it was
                            restored. You say that every time you
                            restore the connection<br>
                            is broken, right? So maybe you could utilize
                            &quot;restored&quot; flag?<br>
                            <br>
                            Thanks,<br>
                            Ruslan <br>
                            <div>
                              <div> <br>
                                <div>On 05/12/2015 09:59 PM, Ross
                                  Boucher wrote:<br>
                                </div>
                              </div>
                            </div>
                            <blockquote type="cite">
                              <div>
                                <div>
                                  <div dir="ltr">In order to get support
                                    working in my application, I&#39;ve
                                    resorted to a hack that works but is
                                    almost certainly not the best way to
                                    do things. I&#39;m interested if anyone
                                    has suggestions for a better way.
                                    First, let me explain how it works. 
                                    <div><br>
                                    </div>
                                    <div>The process I&#39;m checkpointing
                                      is a node.js process that opens a
                                      socket, and waits for a connection
                                      on that socket. Once established,
                                      the connecting process sends code
                                      for the node.js process to
                                      evaluate, in a loop. The node
                                      process is checkpointed between
                                      every message containing new code
                                      to evaluate. </div>
                                    <div><br>
                                    </div>
                                    <div>Now, when we restore, it is
                                      always a completely new process
                                      sending code to the node.js
                                      process, so the built in tcp
                                      socket restoration won&#39;t work. We
                                      had lots of difficulty figuring
                                      out how to detect that the socket
                                      connection had been broken.
                                      Ultimately, the hack we ended up
                                      using was to simply loop forever
                                      on a separate thread checking the
                                      time, and noticing if an
                                      unexplained huge gap in time had
                                      occurred. The looping thread looks
                                      like this:</div>
                                    <div><br>
                                    </div>
                                    <div><br>
                                    </div>
                                    <blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
                                      <div>
                                        <div>void * canceler(void *
                                          threadPointer)</div>
                                      </div>
                                      <div>
                                        <div>{</div>
                                      </div>
                                      <div>
                                        <div>    pthread_t thread =
                                          *(pthread_t *)threadPointer;</div>
                                      </div>
                                      <div>
                                        <div><br>
                                        </div>
                                      </div>
                                      <div>
                                        <div>    time_t start,end;</div>
                                      </div>
                                      <div>
                                        <div>    time(&amp;start);</div>
                                      </div>
                                      <div>
                                        <div><br>
                                        </div>
                                      </div>
                                      <div>
                                        <div>    while(true)</div>
                                      </div>
                                      <div>
                                        <div>    {</div>
                                      </div>
                                      <div>
                                        <div>        usleep(1000);</div>
                                      </div>
                                      <div>
                                        <div>        time(&amp;end);</div>
                                      </div>
                                      <div>
                                        <div>        double diff =
                                          difftime(end,start);</div>
                                      </div>
                                      <div>
                                        <div><br>
                                        </div>
                                      </div>
                                      <div>
                                        <div>        if (diff &gt; 1.0)
                                          {<br>
                                        </div>
                                      </div>
                                      <div>
                                        <div>            // THIS IS
                                          ALMOST CERTAINLY A RESTORE</div>
                                      </div>
                                      <div>
                                        <div>            break;</div>
                                      </div>
                                      <div>
                                        <div>        }</div>
                                      </div>
                                      <div>
                                        <div>    }</div>
                                      </div>
                                      <div>
                                        <div><br>
                                        </div>
                                      </div>
                                      <div>
                                        <div>    // cancel the read
                                          thread<br>
                                        </div>
                                      </div>
                                    </blockquote>
                                    <blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
                                      <div>
                                        <div>    int result =
                                          pthread_cancel(thread);</div>
                                      </div>
                                      <div>
                                        <div><br>
                                        </div>
                                      </div>
                                      <div>
                                        <div>    return NULL;</div>
                                      </div>
                                    </blockquote>
                                    <blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
                                      <div>
                                        <div>}</div>
                                      </div>
                                    </blockquote>
                                    <div><br>
                                    </div>
                                    <div><br>
                                    </div>
                                    <div>Elsewhere, in the code that
                                      actually does the reading, we
                                      spawn this thread with a handle to
                                      the read thread:</div>
                                    <div><br>
                                    </div>
                                    <blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
                                      <div>
                                        <div>pthread_create(&amp;cancelThread,
                                          NULL, canceler, (void
                                          *)readThread);</div>
                                      </div>
                                    </blockquote>
                                    <div><br>
                                    </div>
                                    <div><br>
                                    </div>
                                    <div>The rest of our code understand
                                      how to deal with a broken
                                      connection and is able to
                                      seamlessly reconnect. This is all
                                      working well, but it seems like
                                      there is probably a better way so
                                      I wanted to ask for suggestions. I
                                      also tried getting things to work
                                      with a file based socket rather
                                      than a TCP socket, but that proved
                                      even more difficult (and was far
                                      more complicated in our
                                      architecture anyway, so I&#39;d prefer
                                      not to return down that path).</div>
                                    <div><br>
                                    </div>
                                    <div>- Ross</div>
                                    <div><br>
                                    </div>
                                    <div>[1] From my other email thread,
                                      this video might help illustrate
                                      the actual process going on, if my
                                      description isn&#39;t that clear: </div>
                                    <div><br>
                                    </div>
                                    <div><a href="https://www.youtube.com/watch?v=F2L6JLFuFWs&amp;feature=youtu.be" target="_blank">https://www.youtube.com/watch?v=F2L6JLFuFWs&amp;feature=youtu.be</a><br>
                                    </div>
                                    <div><br>
                                    </div>
                                    <div><br>
                                    </div>
                                  </div>
                                  <br>
                                  <fieldset></fieldset>
                                  <br>
                                </div>
                              </div>
                              <pre>_______________________________________________
CRIU mailing list
<a href="mailto:CRIU@openvz.org" target="_blank">CRIU@openvz.org</a>
<a href="https://lists.openvz.org/mailman/listinfo/criu" target="_blank">https://lists.openvz.org/mailman/listinfo/criu</a>
</pre>
                            </blockquote>
                            <br>
                          </div>
                        </blockquote>
                      </div>
                      <br>
                    </div>
                  </blockquote>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br></div>