[Devel] Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
Cedric Le Goater
clg at fr.ibm.com
Mon Feb 9 12:14:24 PST 2009
Mike Waychison wrote:
> Jim Winget wrote:
>> Any way to use a delayed checkpoint signal (perhaps somewhat
>> non-deterministic, e.g. "do it now" really means "do it pretty soon") that
>> is only taken on return to user space thus allowing a deterministic
>> solution?
>
> Ya, I'm thinking that a 'checkpoint' signal would be advisory, with the
> SIG_DFL action performing the checkpoint itself.
>
> Considering that we'd need to cleanly get access to all registers, the
> checkpoint itself needs to be a well defined path from
> userland->kernelland. I'm wondering if sys_checkpoint could be this
> well-defined path using the PTREGSCALL stub macro.
>
> For tasks that aren't checkpoint-aware, SIG_DFL could possibly be done
> by having the vsyscall page/vdso implement the userland sighandler that
> calls sys_checkpoint.
>
> What this means though is that we won't be able to freeze or SIGSTOP
> tasks before checkpoint.
the sys_checkpoint() in the userland sighandler you are proposing, is how
you would freeze all the tasks of a container. Once all the tasks have
entered sys_checkpoint() and are blocked on a wait queue, you can start
gathering states.
This means that you need to count how many tasks should enter sys_checkpoint().
The cgroup fork callback can be used to signal new comers and maintain
a coherent count of tasks. But we would also need an exit callback, which
is not available.
> Both of these paths can be entered via a
> variety of kernel entry points and unless we start dumping the full
> ptregs on each entry point, we'll never be able to reliably get access
> to all registers.
>
> sys_checkpoint itself would have to have it's own method to quiesce all
> the tasks (basically wait for all tasks to enter sys_checkpoint so that
> a multi-task checkpoint is self-consistent).
yes.
sys_restart() works the same, all the tasks are signalled in advance how
many should enter the wait queue. once the task state is restored, you
let each task restart from its signal handler using the cpu state that
was saved on user stack at checkpoint time.
> The nice thing about a signal too is that userland can block it and
> ignore it in a deterministic way.
yes and
The *very* nice thing about a signal handler is that you don't have to
worry about your cpu state. I don't think it's a good idea to duplicate
this code in the C/R framework. it is *very* arch dependent.
> The failure logic for ignored or blocked-for-a-long time can be pushed
> back down to userland.
>
> This is all a dramatic shift from the current way things are done, so
> we'd be best getting a better feel for our options though..
I think that the current way of doing things is work in progress and needs
to be reviewed. The way checkpoint/restart is triggered has always been
controversial among the stakeholders.
We've been maintaining a C/R solution on ppc32, ppc64, x86, x86_64, ia64,
s390, s390x since 2002 working on the above principles you are describing.
UNICOS and later IRIX used similar principles, following the POSIX draft
on checkpoint/restart.
For the signal, we have 'hijacked' SIGSTOP but new signals SIGCKPT and
SIGRESTART would definitely be a nicer solution for a mainline solution.
Cheers,
C.
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list