[CRIU] Issues with restoring multiple instances of the same source

Ruslan Kuprieiev kupruser at gmail.com
Thu Sep 3 17:48:41 PDT 2015


Hi, Francisco,

On 04.09.15 02:05, Francisco Tolmasky wrote:
> So I have been tracking a bug in tonic (related to this logging issue, 
> and general “breaking” of pipes/streams), and I have narrowed part of 
> the problem to the fact that we restore multiple containers 
> simultaneously from the same source run. We do this to have them 
> “warm” and ready in case the user wants to go back to a previous 
> checkpoint. So something along these lines happens:
>
> Program is running -> Checkpoint -> immediate restore IN PARALLEL to 
> original program/restore from previous checkpoint IN PARALLEL as well.
>
> So, you can end up with up to 3 copies of the same program running. 
> Eventually that original one will die and we will choose one of the 
> two “waiting” copies to pick up from.
>
> So, my first question is whether you would expect things to start 
> breaking in this scenario (they seem to work a lot of times, again, we 
> see occasional failures over time in the form of stream breakages 
> possibly, or just getting “stuck” (I believe it gets stuck waiting on 
> a pipe though)).
>

Could you provide some logs, please?

> My second question is, if this is in fact not expected to work well, 
> would it be possible to “Restore” a container but not “start” it. That 
> is, load up the memory get everything ready but have it waiting for a 
> signal to actually kick off and get going. That way we can get most 
> the benefit of pre-warming these restores, without having them all 
> actually running at once.
>

That's a great question. I've been thinking about implementing 
--leave-stopped for restore, but never actually came to that. I've tried 
just adding || opts.final_state == TASK_STOPPED to 
https://github.com/xemul/criu/blob/master/cr-restore.c#L1715 and it 
seems to work just fine with a test loop, though I'm not sure that it 
will always work in more complicated scenarios.

Also added this task to TODO list[1].

[1] http://criu.org/Todo

> Thanks,
>
> Francisco
>
> -- 
> Francisco Tolmasky
> www.tolmasky.com <http://www.tolmasky.com>
> tolmasky at gmail.com <mailto:tolmasky at gmail.com>
>
>
> _______________________________________________
> CRIU mailing list
> CRIU at openvz.org
> https://lists.openvz.org/mailman/listinfo/criu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150904/9e2e4ba5/attachment.html>


More information about the CRIU mailing list