<p dir="ltr">By sending SIGCONT, I guess. <br></p>
<p dir="ltr">4 сент. 2015 г. 5:11 AM пользователь "Francisco Tolmasky" <<a href="mailto:tolmasky@gmail.com">tolmasky@gmail.com</a>> написал:<br>
><br>
> Interesting, how would I then “revive” it if I add the final == TASK_STOPPED to that line? (Will respond about logs in another email, longer answer).<br>
><br>
> On Thu, Sep 3, 2015 at 5:48 PM, Ruslan Kuprieiev <<a href="mailto:kupruser@gmail.com">kupruser@gmail.com</a>> wrote:<br>
>><br>
>> Hi, Francisco,<br>
>><br>
>> On 04.09.15 02:05, Francisco Tolmasky wrote:<br>
>>><br>
>>> So I have been tracking a bug in tonic (related to this logging issue, and general “breaking” of pipes/streams), and I have narrowed part of the problem to the fact that we restore multiple containers simultaneously from the same source run. We do this to have them “warm” and ready in case the user wants to go back to a previous checkpoint. So something along these lines happens:<br>
>>><br>
>>> Program is running -> Checkpoint -> immediate restore IN PARALLEL to original program/restore from previous checkpoint IN PARALLEL as well.<br>
>>><br>
>>> So, you can end up with up to 3 copies of the same program running. Eventually that original one will die and we will choose one of the two “waiting” copies to pick up from.<br>
>>><br>
>>> So, my first question is whether you would expect things to start breaking in this scenario (they seem to work a lot of times, again, we see occasional failures over time in the form of stream breakages possibly, or just getting “stuck” (I believe it gets stuck waiting on a pipe though)). <br>
>>><br>
>><br>
>> Could you provide some logs, please?<br>
>><br>
>>> My second question is, if this is in fact not expected to work well, would it be possible to “Restore” a container but not “start” it. That is, load up the memory get everything ready but have it waiting for a signal to actually kick off and get going. That way we can get most the benefit of pre-warming these restores, without having them all actually running at once.<br>
>>><br>
>><br>
>> That's a great question. I've been thinking about implementing --leave-stopped for restore, but never actually came to that. I've tried just adding || opts.final_state == TASK_STOPPED to <a href="https://github.com/xemul/criu/blob/master/cr-restore.c#L1715">https://github.com/xemul/criu/blob/master/cr-restore.c#L1715</a> and it seems to work just fine with a test loop, though I'm not sure that it will always work in more complicated scenarios.<br>
>><br>
>> Also added this task to TODO list[1].<br>
>><br>
>> [1] <a href="http://criu.org/Todo">http://criu.org/Todo</a><br>
>><br>
>>> Thanks,<br>
>>><br>
>>> Francisco<br>
>>><br>
>>> -- <br>
>>> Francisco Tolmasky<br>
>>> <a href="http://www.tolmasky.com">www.tolmasky.com</a><br>
>>> <a href="mailto:tolmasky@gmail.com">tolmasky@gmail.com</a><br>
>>><br>
>>><br>
>>> _______________________________________________<br>
>>> CRIU mailing list<br>
>>> <a href="mailto:CRIU@openvz.org">CRIU@openvz.org</a><br>
>>> <a href="https://lists.openvz.org/mailman/listinfo/criu">https://lists.openvz.org/mailman/listinfo/criu</a><br>
>><br>
>><br>
><br>
><br>
><br>
> -- <br>
> Francisco Tolmasky<br>
> <a href="http://www.tolmasky.com">www.tolmasky.com</a><br>
> <a href="mailto:tolmasky@gmail.com">tolmasky@gmail.com</a></p>