<div dir="ltr">So I have been tracking a bug in tonic (related to this logging issue, and general “breaking” of pipes/streams), and I have narrowed part of the problem to the fact that we restore multiple containers simultaneously from the same source run. We do this to have them “warm” and ready in case the user wants to go back to a previous checkpoint. So something along these lines happens:<div><br></div><div>Program is running -> Checkpoint -> immediate restore IN PARALLEL to original program/restore from previous checkpoint IN PARALLEL as well.</div><div><br></div><div>So, you can end up with up to 3 copies of the same program running. Eventually that original one will die and we will choose one of the two “waiting” copies to pick up from.</div><div><br></div><div>So, my first question is whether you would expect things to start breaking in this scenario (they seem to work a lot of times, again, we see occasional failures over time in the form of stream breakages possibly, or just getting “stuck” (I believe it gets stuck waiting on a pipe though)). </div><div><br></div><div>My second question is, if this is in fact not expected to work well, would it be possible to “Restore” a container but not “start” it. That is, load up the memory get everything ready but have it waiting for a signal to actually kick off and get going. That way we can get most the benefit of pre-warming these restores, without having them all actually running at once.</div><div><br></div><div>Thanks,</div><div><br></div><div>Francisco<br clear="all"><div><br></div>-- <br><div class="gmail_signature">Francisco Tolmasky<br><a href="http://www.tolmasky.com" target="_blank">www.tolmasky.com</a><br><a href="mailto:tolmasky@gmail.com" target="_blank">tolmasky@gmail.com</a></div>
</div></div>