[CRIU] Alternative to hacky resume detection

Ross Boucher rboucher at gmail.com
Tue May 12 14:36:06 PDT 2015


That's an interesting idea. Though, my process is inside of a docker
container, and I think it would get upset by being restored into a
different container. I think I need the coordination docker is doing in
order for my system to work.

On Tue, May 12, 2015 at 2:27 PM, Ruslan Kuprieiev <kupruser at gmail.com>
wrote:

>  I'm saying that you might want to consider calling criu_dump() from a
> process that you are
> trying to dump. We call it self dump[1]. For example, using criu_dump()
> from libcriu it might look like:
>
> ...
> while (1) {
>     ret = criu_dump();
>     if (ret < 0) {
>         /*error*/
>     } else if (ret == 0) {
>        /*dump is ok*/
>     } else if (ret == 1) {
>       /*This process is restored*/
>       /*reestablish connection or do whatever needs to be done
>        * in case of broken connection */
>     }
>     /*accept connection and evaluate code*/
> }
> ...
>
> [1] http://criu.org/Self_dump
>
>
>
> On 05/12/2015 11:25 PM, Ross Boucher wrote:
>
> I'm not sure I follow. You're saying, the process that actually calls
> restore would get notified? Or, are you saying that somehow in the restored
> process I can access something set by criu?
>
>  Assuming the former, I don't think that's necessary -- I already know
> that I've just restored the process. I could try to send a signal from the
> coordinating process and then use that signal to cancel the read thread,
> which would be mostly the same thing. But because that would have to travel
> through quite a few layers, it seems like it would be better and more
> performant to do it from within the restored process itself.
>
>  Perhaps I am just misunderstanding your suggestion though.
>
>
> On Tue, May 12, 2015 at 12:37 PM, Ruslan Kuprieiev <kupruser at gmail.com>
> wrote:
>
>>  Hi, Ross
>>
>> When restoring using RPC or Libcriu response message contains "restored"
>> field set to true,
>> that help process to detect if it was restored. You say that every time
>> you restore the connection
>> is broken, right? So maybe you could utilize "restored" flag?
>>
>> Thanks,
>> Ruslan
>>
>> On 05/12/2015 09:59 PM, Ross Boucher wrote:
>>
>>  In order to get support working in my application, I've resorted to a
>> hack that works but is almost certainly not the best way to do things. I'm
>> interested if anyone has suggestions for a better way. First, let me
>> explain how it works.
>>
>>  The process I'm checkpointing is a node.js process that opens a socket,
>> and waits for a connection on that socket. Once established, the connecting
>> process sends code for the node.js process to evaluate, in a loop. The node
>> process is checkpointed between every message containing new code to
>> evaluate.
>>
>>  Now, when we restore, it is always a completely new process sending
>> code to the node.js process, so the built in tcp socket restoration won't
>> work. We had lots of difficulty figuring out how to detect that the socket
>> connection had been broken. Ultimately, the hack we ended up using was to
>> simply loop forever on a separate thread checking the time, and noticing if
>> an unexplained huge gap in time had occurred. The looping thread looks like
>> this:
>>
>>
>>   void * canceler(void * threadPointer)
>>  {
>>      pthread_t thread = *(pthread_t *)threadPointer;
>>
>>       time_t start,end;
>>      time(&start);
>>
>>       while(true)
>>      {
>>          usleep(1000);
>>          time(&end);
>>          double diff = difftime(end,start);
>>
>>           if (diff > 1.0) {
>>               // THIS IS ALMOST CERTAINLY A RESTORE
>>              break;
>>          }
>>      }
>>
>>       // cancel the read thread
>>
>>      int result = pthread_cancel(thread);
>>
>>       return NULL;
>>
>>  }
>>
>>
>>
>>  Elsewhere, in the code that actually does the reading, we spawn this
>> thread with a handle to the read thread:
>>
>>   pthread_create(&cancelThread, NULL, canceler, (void *)readThread);
>>
>>
>>
>>  The rest of our code understand how to deal with a broken connection
>> and is able to seamlessly reconnect. This is all working well, but it seems
>> like there is probably a better way so I wanted to ask for suggestions. I
>> also tried getting things to work with a file based socket rather than a
>> TCP socket, but that proved even more difficult (and was far more
>> complicated in our architecture anyway, so I'd prefer not to return down
>> that path).
>>
>>  - Ross
>>
>>  [1] From my other email thread, this video might help illustrate the
>> actual process going on, if my description isn't that clear:
>>
>>  https://www.youtube.com/watch?v=F2L6JLFuFWs&feature=youtu.be
>>
>>
>>
>>
>>  _______________________________________________
>> CRIU mailing listCRIU at openvz.orghttps://lists.openvz.org/mailman/listinfo/criu
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150512/291a6553/attachment.html>


More information about the CRIU mailing list