[CRIU] Alternative to hacky resume detection
Ruslan Kuprieiev
kupruser at gmail.com
Tue May 12 14:27:57 PDT 2015
I'm saying that you might want to consider calling criu_dump() from a
process that you are
trying to dump. We call it self dump[1]. For example, using criu_dump()
from libcriu it might look like:
...
while (1) {
ret = criu_dump();
if (ret < 0) {
/*error*/
} else if (ret == 0) {
/*dump is ok*/
} else if (ret == 1) {
/*This process is restored*/
/*reestablish connection or do whatever needs to be done
* in case of broken connection */
}
/*accept connection and evaluate code*/
}
...
[1] http://criu.org/Self_dump
On 05/12/2015 11:25 PM, Ross Boucher wrote:
> I'm not sure I follow. You're saying, the process that actually calls
> restore would get notified? Or, are you saying that somehow in the
> restored process I can access something set by criu?
>
> Assuming the former, I don't think that's necessary -- I already know
> that I've just restored the process. I could try to send a signal from
> the coordinating process and then use that signal to cancel the read
> thread, which would be mostly the same thing. But because that would
> have to travel through quite a few layers, it seems like it would be
> better and more performant to do it from within the restored process
> itself.
>
> Perhaps I am just misunderstanding your suggestion though.
>
>
> On Tue, May 12, 2015 at 12:37 PM, Ruslan Kuprieiev <kupruser at gmail.com
> <mailto:kupruser at gmail.com>> wrote:
>
> Hi, Ross
>
> When restoring using RPC or Libcriu response message contains
> "restored" field set to true,
> that help process to detect if it was restored. You say that every
> time you restore the connection
> is broken, right? So maybe you could utilize "restored" flag?
>
> Thanks,
> Ruslan
>
> On 05/12/2015 09:59 PM, Ross Boucher wrote:
>> In order to get support working in my application, I've resorted
>> to a hack that works but is almost certainly not the best way to
>> do things. I'm interested if anyone has suggestions for a better
>> way. First, let me explain how it works.
>>
>> The process I'm checkpointing is a node.js process that opens a
>> socket, and waits for a connection on that socket. Once
>> established, the connecting process sends code for the node.js
>> process to evaluate, in a loop. The node process is checkpointed
>> between every message containing new code to evaluate.
>>
>> Now, when we restore, it is always a completely new process
>> sending code to the node.js process, so the built in tcp socket
>> restoration won't work. We had lots of difficulty figuring out
>> how to detect that the socket connection had been broken.
>> Ultimately, the hack we ended up using was to simply loop forever
>> on a separate thread checking the time, and noticing if an
>> unexplained huge gap in time had occurred. The looping thread
>> looks like this:
>>
>>
>> void * canceler(void * threadPointer)
>> {
>> pthread_t thread = *(pthread_t *)threadPointer;
>>
>> time_t start,end;
>> time(&start);
>>
>> while(true)
>> {
>> usleep(1000);
>> time(&end);
>> double diff = difftime(end,start);
>>
>> if (diff > 1.0) {
>> // THIS IS ALMOST CERTAINLY A RESTORE
>> break;
>> }
>> }
>>
>> // cancel the read thread
>>
>> int result = pthread_cancel(thread);
>>
>> return NULL;
>>
>> }
>>
>>
>>
>> Elsewhere, in the code that actually does the reading, we spawn
>> this thread with a handle to the read thread:
>>
>> pthread_create(&cancelThread, NULL, canceler, (void
>> *)readThread);
>>
>>
>>
>> The rest of our code understand how to deal with a broken
>> connection and is able to seamlessly reconnect. This is all
>> working well, but it seems like there is probably a better way so
>> I wanted to ask for suggestions. I also tried getting things to
>> work with a file based socket rather than a TCP socket, but that
>> proved even more difficult (and was far more complicated in our
>> architecture anyway, so I'd prefer not to return down that path).
>>
>> - Ross
>>
>> [1] From my other email thread, this video might help illustrate
>> the actual process going on, if my description isn't that clear:
>>
>> https://www.youtube.com/watch?v=F2L6JLFuFWs&feature=youtu.be
>>
>>
>>
>>
>> _______________________________________________
>> CRIU mailing list
>> CRIU at openvz.org <mailto:CRIU at openvz.org>
>> https://lists.openvz.org/mailman/listinfo/criu
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150513/d8326f80/attachment-0001.html>
More information about the CRIU
mailing list