[CRIU] Issues with restoring multiple instances of the same source

Francisco Tolmasky tolmasky at gmail.com
Thu Sep 3 16:05:09 PDT 2015


So I have been tracking a bug in tonic (related to this logging issue, and
general “breaking” of pipes/streams), and I have narrowed part of the
problem to the fact that we restore multiple containers simultaneously from
the same source run. We do this to have them “warm” and ready in case the
user wants to go back to a previous checkpoint. So something along these
lines happens:

Program is running -> Checkpoint -> immediate restore IN PARALLEL to
original program/restore from previous checkpoint IN PARALLEL as well.

So, you can end up with up to 3 copies of the same program running.
Eventually that original one will die and we will choose one of the two
“waiting” copies to pick up from.

So, my first question is whether you would expect things to start breaking
in this scenario (they seem to work a lot of times, again, we see
occasional failures over time in the form of stream breakages possibly, or
just getting “stuck” (I believe it gets stuck waiting on a pipe though)).

My second question is, if this is in fact not expected to work well, would
it be possible to “Restore” a container but not “start” it. That is, load
up the memory get everything ready but have it waiting for a signal to
actually kick off and get going. That way we can get most the benefit of
pre-warming these restores, without having them all actually running at
once.

Thanks,

Francisco

-- 
Francisco Tolmasky
www.tolmasky.com
tolmasky at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150903/54ab18e1/attachment.html>


More information about the CRIU mailing list