[CRIU] Periodic checkpointing (using perf and signals?)
Zhengyu He
hzy at google.com
Wed Jul 17 13:18:13 EDT 2013
On Wed, Jul 17, 2013 at 8:57 AM, Pavel Emelyanov <xemul at parallels.com>wrote:
> On 07/17/2013 07:44 PM, Christopher Covington wrote:
> > Hi,
> >
> > I'm interested in taking checkpoints of processes from fast systems like
> > hardware and restoring them on really slow software models for
> performance
> > analysis.
>
> Great idea! I will add it on http://criu.org/Usage_scenarios :)
>
> > So far I've been able to save and restore checkpoints on the
> > different systems using CRIU. Now I'm looking for some way to trigger the
> > checkpointing. One basic use case might be to take a process that runs
> for say
> > 100M instructions and take a checkpoint every 10M instructions to be
> restored
> > as 10 parallel runs of the model.
> >
> > I'm thinking of trying to use performance counters to trigger such
> behavior.
> > Does perf already have support for triggering things like this?
>
> I'm not 100% sure, but I've seen examples of python plugins for perf. From
> these examples, I believe that it's possible to write a plugin, that will
> run
> some code after noticing 100M instructions.
>
This is definitely possible. You just need to register a signal handler and
config your counter properly. Please see
http://web.eece.maine.edu/~vweaver/projects/perf_events/perf_event_open.html#lbAH
>
> > If not, I'm
> > thinking of trying to work in the ability to send a signal, like stop,
> to the
> > process of interest once the specified count, such as 10M instructions,
> has
> > been reached. CRIU or a wrapper could then wait for process of interest
> to
> > stop, take the checkpoint, let the process continue, and then wait for
> it to
> > stop again or exit. Would such an approach make sense?
>
> It makes perfect sense! Several things to note from my side.
>
> 1. It's perfect case where the --track-mem + --prev-images-dir options
> should be
> used. It will help subsequent dumps take MUCH less time, since with them
> CRIU
> will not take full task dump, but instead will only grab what has changed
> since
> last dump.
>
> 2. Current version of CRIU doesn't work with stopped tasks. We're currently
> developing it and this functionality will be available with v0.7 only.
> However,
> I think it's OK just to start "criu dump" command after perf trigger. The
> dump
> would work on a process that has done slightly more than 10M instructions,
> but
> that would be the same in case you send it STOP signal.
>
> > Thanks,
> > Christopher
>
> Thanks,
> Pavel
> _______________________________________________
> CRIU mailing list
> CRIU at openvz.org
> https://lists.openvz.org/mailman/listinfo/criu
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20130717/7a9b41d3/attachment.html>
More information about the CRIU
mailing list