[CRIU] Alternative to hacky resume detection

Ruslan Kuprieiev kupruser at gmail.com
Tue May 12 14:27:57 PDT 2015


I'm saying that you might want to consider calling criu_dump() from a 
process that you are
trying to dump. We call it self dump[1]. For example, using criu_dump() 
from libcriu it might look like:

...
while (1) {
     ret = criu_dump();
     if (ret < 0) {
         /*error*/
     } else if (ret == 0) {
        /*dump is ok*/
     } else if (ret == 1) {
       /*This process is restored*/
       /*reestablish connection or do whatever needs to be done
        * in case of broken connection */
     }
     /*accept connection and evaluate code*/
}
...

[1] http://criu.org/Self_dump


On 05/12/2015 11:25 PM, Ross Boucher wrote:
> I'm not sure I follow. You're saying, the process that actually calls 
> restore would get notified? Or, are you saying that somehow in the 
> restored process I can access something set by criu?
>
> Assuming the former, I don't think that's necessary -- I already know 
> that I've just restored the process. I could try to send a signal from 
> the coordinating process and then use that signal to cancel the read 
> thread, which would be mostly the same thing. But because that would 
> have to travel through quite a few layers, it seems like it would be 
> better and more performant to do it from within the restored process 
> itself.
>
> Perhaps I am just misunderstanding your suggestion though.
>
>
> On Tue, May 12, 2015 at 12:37 PM, Ruslan Kuprieiev <kupruser at gmail.com 
> <mailto:kupruser at gmail.com>> wrote:
>
>     Hi, Ross
>
>     When restoring using RPC or Libcriu response message contains
>     "restored" field set to true,
>     that help process to detect if it was restored. You say that every
>     time you restore the connection
>     is broken, right? So maybe you could utilize "restored" flag?
>
>     Thanks,
>     Ruslan
>
>     On 05/12/2015 09:59 PM, Ross Boucher wrote:
>>     In order to get support working in my application, I've resorted
>>     to a hack that works but is almost certainly not the best way to
>>     do things. I'm interested if anyone has suggestions for a better
>>     way. First, let me explain how it works.
>>
>>     The process I'm checkpointing is a node.js process that opens a
>>     socket, and waits for a connection on that socket. Once
>>     established, the connecting process sends code for the node.js
>>     process to evaluate, in a loop. The node process is checkpointed
>>     between every message containing new code to evaluate.
>>
>>     Now, when we restore, it is always a completely new process
>>     sending code to the node.js process, so the built in tcp socket
>>     restoration won't work. We had lots of difficulty figuring out
>>     how to detect that the socket connection had been broken.
>>     Ultimately, the hack we ended up using was to simply loop forever
>>     on a separate thread checking the time, and noticing if an
>>     unexplained huge gap in time had occurred. The looping thread
>>     looks like this:
>>
>>
>>         void * canceler(void * threadPointer)
>>         {
>>             pthread_t thread = *(pthread_t *)threadPointer;
>>
>>             time_t start,end;
>>             time(&start);
>>
>>             while(true)
>>             {
>>                 usleep(1000);
>>                 time(&end);
>>                 double diff = difftime(end,start);
>>
>>                 if (diff > 1.0) {
>>                     // THIS IS ALMOST CERTAINLY A RESTORE
>>                     break;
>>                 }
>>             }
>>
>>             // cancel the read thread
>>
>>             int result = pthread_cancel(thread);
>>
>>             return NULL;
>>
>>         }
>>
>>
>>
>>     Elsewhere, in the code that actually does the reading, we spawn
>>     this thread with a handle to the read thread:
>>
>>         pthread_create(&cancelThread, NULL, canceler, (void
>>         *)readThread);
>>
>>
>>
>>     The rest of our code understand how to deal with a broken
>>     connection and is able to seamlessly reconnect. This is all
>>     working well, but it seems like there is probably a better way so
>>     I wanted to ask for suggestions. I also tried getting things to
>>     work with a file based socket rather than a TCP socket, but that
>>     proved even more difficult (and was far more complicated in our
>>     architecture anyway, so I'd prefer not to return down that path).
>>
>>     - Ross
>>
>>     [1] From my other email thread, this video might help illustrate
>>     the actual process going on, if my description isn't that clear:
>>
>>     https://www.youtube.com/watch?v=F2L6JLFuFWs&feature=youtu.be
>>
>>
>>
>>
>>     _______________________________________________
>>     CRIU mailing list
>>     CRIU at openvz.org  <mailto:CRIU at openvz.org>
>>     https://lists.openvz.org/mailman/listinfo/criu
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150513/d8326f80/attachment-0001.html>


More information about the CRIU mailing list