[CRIU] Issues with restoring multiple instances of the same source
Ruslan Kuprieiev
kupruser at gmail.com
Thu Sep 3 17:48:41 PDT 2015
Hi, Francisco,
On 04.09.15 02:05, Francisco Tolmasky wrote:
> So I have been tracking a bug in tonic (related to this logging issue,
> and general “breaking” of pipes/streams), and I have narrowed part of
> the problem to the fact that we restore multiple containers
> simultaneously from the same source run. We do this to have them
> “warm” and ready in case the user wants to go back to a previous
> checkpoint. So something along these lines happens:
>
> Program is running -> Checkpoint -> immediate restore IN PARALLEL to
> original program/restore from previous checkpoint IN PARALLEL as well.
>
> So, you can end up with up to 3 copies of the same program running.
> Eventually that original one will die and we will choose one of the
> two “waiting” copies to pick up from.
>
> So, my first question is whether you would expect things to start
> breaking in this scenario (they seem to work a lot of times, again, we
> see occasional failures over time in the form of stream breakages
> possibly, or just getting “stuck” (I believe it gets stuck waiting on
> a pipe though)).
>
Could you provide some logs, please?
> My second question is, if this is in fact not expected to work well,
> would it be possible to “Restore” a container but not “start” it. That
> is, load up the memory get everything ready but have it waiting for a
> signal to actually kick off and get going. That way we can get most
> the benefit of pre-warming these restores, without having them all
> actually running at once.
>
That's a great question. I've been thinking about implementing
--leave-stopped for restore, but never actually came to that. I've tried
just adding || opts.final_state == TASK_STOPPED to
https://github.com/xemul/criu/blob/master/cr-restore.c#L1715 and it
seems to work just fine with a test loop, though I'm not sure that it
will always work in more complicated scenarios.
Also added this task to TODO list[1].
[1] http://criu.org/Todo
> Thanks,
>
> Francisco
>
> --
> Francisco Tolmasky
> www.tolmasky.com <http://www.tolmasky.com>
> tolmasky at gmail.com <mailto:tolmasky at gmail.com>
>
>
> _______________________________________________
> CRIU mailing list
> CRIU at openvz.org
> https://lists.openvz.org/mailman/listinfo/criu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150904/9e2e4ba5/attachment.html>
More information about the CRIU
mailing list