<div dir="ltr">In order to get support working in my application, I&#39;ve resorted to a hack that works but is almost certainly not the best way to do things. I&#39;m interested if anyone has suggestions for a better way. First, let me explain how it works. <div><br></div><div>The process I&#39;m checkpointing is a node.js process that opens a socket, and waits for a connection on that socket. Once established, the connecting process sends code for the node.js process to evaluate, in a loop. The node process is checkpointed between every message containing new code to evaluate. </div><div><br></div><div>Now, when we restore, it is always a completely new process sending code to the node.js process, so the built in tcp socket restoration won&#39;t work. We had lots of difficulty figuring out how to detect that the socket connection had been broken. Ultimately, the hack we ended up using was to simply loop forever on a separate thread checking the time, and noticing if an unexplained huge gap in time had occurred. The looping thread looks like this:</div><div><br></div><div><br></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div><div>void * canceler(void * threadPointer)</div></div><div><div>{</div></div><div><div>    pthread_t thread = *(pthread_t *)threadPointer;</div></div><div><div><br></div></div><div><div>    time_t start,end;</div></div><div><div>    time(&amp;start);</div></div><div><div><br></div></div><div><div>    while(true)</div></div><div><div>    {</div></div><div><div>        usleep(1000);</div></div><div><div>        time(&amp;end);</div></div><div><div>        double diff = difftime(end,start);</div></div><div><div><br></div></div><div><div>        if (diff &gt; 1.0) {<br></div></div><div><div>            // THIS IS ALMOST CERTAINLY A RESTORE</div></div><div><div>            break;</div></div><div><div>        }</div></div><div><div>    }</div></div><div><div><br></div></div><div><div>    // cancel the read thread<br></div></div></blockquote><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div><div>    int result = pthread_cancel(thread);</div></div><div><div><br></div></div><div><div>    return NULL;</div></div></blockquote><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div><div>}</div></div></blockquote><div><br></div><div><br></div><div>Elsewhere, in the code that actually does the reading, we spawn this thread with a handle to the read thread:</div><div><br></div><blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px"><div><div>pthread_create(&amp;cancelThread, NULL, canceler, (void *)readThread);</div></div></blockquote><div><br></div><div><br></div><div>The rest of our code understand how to deal with a broken connection and is able to seamlessly reconnect. This is all working well, but it seems like there is probably a better way so I wanted to ask for suggestions. I also tried getting things to work with a file based socket rather than a TCP socket, but that proved even more difficult (and was far more complicated in our architecture anyway, so I&#39;d prefer not to return down that path).</div><div><br></div><div>- Ross</div><div><br></div><div>[1] From my other email thread, this video might help illustrate the actual process going on, if my description isn&#39;t that clear: </div><div><br></div><div><a href="https://www.youtube.com/watch?v=F2L6JLFuFWs&amp;feature=youtu.be">https://www.youtube.com/watch?v=F2L6JLFuFWs&amp;feature=youtu.be</a><br></div><div><br></div><div><br></div></div>