[Devel] Re: [RFC][PATCH] x86_86 support of checkpoint/restart (Re: Checkpoint / Restart)
Jim Winget
winget at google.com
Mon Feb 9 09:53:16 PST 2009
Any way to use a delayed checkpoint signal (perhaps somewhat
non-deterministic, e.g. "do it now" really means "do it pretty soon") that
is only taken on return to user space thus allowing a deterministic
solution?
Jim
On Fri, Feb 6, 2009 at 4:17 PM, Nauman Rafique <nauman at google.com> wrote:
>
> The patch sent by Masahiko assumes that all the user-space registers are
> saved on
> the kernel stack on a system call. This is not true for the majority
> of the system calls. The callee saved registers (as defined by x86_64
> ABI) - rbx, rbp, r12, r13, r14, r15 - are saved only in some special
> cases. That means that these registers would not be available to
> checkpoint code. Moreover, the restore code would have no space in
> stack to restore those registers.
>
> This patch partially solves that problem, but using a stub around
> checkpoint/restart system calls. This stub saves/restores those callee
> saved registers to/from the kernel stack. This solves the problem in
> the case of self checkpoint and restore.
>
> In case of external checkpoint, there is no clean way to have access
> to these callee saved registers. We freeze or SIGSTOP the process that
> has to be checkpointed. The process could have entered the kernel
> space via any arbitrary code path before it was stopped or
> frozen. Thus the callee saved registers were not saved in pt_regs
> (i.e. the bottom of the kernel mode stack). They would be saved at
> some arbitrary place in the kernel mode stack. And when we want to
> checkpoint that process, we cannot find those registers and save them
> in the checkpoint.
>
> Possible solutions to this external checkpointing problem include
> saving/restoring all registers (not feasible as it would have
> performance penalty for every code path), and overloading a signal for
> achieving external checkpointing. Any ideas?
> ---
>
> arch/x86/include/asm/unistd_64.h | 4 ++--
> arch/x86/kernel/entry_64.S | 10 ++++++++++
> arch/x86/mm/checkpoint.c | 3 +--
> arch/x86/mm/restart.c | 5 ++---
> 4 files changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/include/asm/unistd_64.h
> b/arch/x86/include/asm/unistd_64.h
> index fe7174d..76aa903 100644
> --- a/arch/x86/include/asm/unistd_64.h
> +++ b/arch/x86/include/asm/unistd_64.h
> @@ -654,9 +654,9 @@ __SYSCALL(__NR_pipe2, sys_pipe2)
> #define __NR_inotify_init1 294
> __SYSCALL(__NR_inotify_init1, sys_inotify_init1)
> #define __NR_checkpoint 295
> -__SYSCALL(__NR_checkpoint, sys_checkpoint)
> +__SYSCALL(__NR_checkpoint, stub_checkpoint)
> #define __NR_restart 296
> -__SYSCALL(__NR_restart, sys_restart)
> +__SYSCALL(__NR_restart, stub_restart)
>
>
> #ifndef __NO_STUBS
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index b86f332..0369267 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -545,6 +545,14 @@ END(system_call)
> END(\label)
> .endm
>
> + .macro FULLSTACKCALL label,func
> + .globl \label
> + \label:
> + leaq \func(%rip),%rax
> + jmp ptregscall_common
> + END(\label)
> + .endm
> +
> CFI_STARTPROC
>
> PTREGSCALL stub_clone, sys_clone, %r8
> @@ -552,6 +560,8 @@ END(\label)
> PTREGSCALL stub_vfork, sys_vfork, %rdi
> PTREGSCALL stub_sigaltstack, sys_sigaltstack, %rdx
> PTREGSCALL stub_iopl, sys_iopl, %rsi
> + FULLSTACKCALL stub_restart, sys_restart
> + FULLSTACKCALL stub_checkpoint, sys_checkpoint
>
> ENTRY(ptregscall_common)
> popq %r11
> diff --git a/arch/x86/mm/checkpoint.c b/arch/x86/mm/checkpoint.c
> index 2514f14..a26332d 100644
> --- a/arch/x86/mm/checkpoint.c
> +++ b/arch/x86/mm/checkpoint.c
> @@ -75,10 +75,10 @@ static void cr_save_cpu_regs(struct cr_hdr_cpu *hh,
> struct task_struct *t)
> hh->ip = regs->ip;
> hh->cs = regs->cs;
> hh->flags = regs->flags;
> + hh->sp = regs->sp;
> hh->ss = regs->ss;
>
> #ifdef CONFIG_X86_64
> - hh->sp = read_pda (oldrsp);
> hh->r8 = regs->r8;
> hh->r9 = regs->r9;
> hh->r10 = regs->r10;
> @@ -90,7 +90,6 @@ static void cr_save_cpu_regs(struct cr_hdr_cpu *hh,
> struct task_struct *t)
> hh->ds = thread->ds;
> hh->es = thread->es;
> #else /* !CONFIG_X86_64 */
> - hh->sp = regs->sp;
> hh->ds = regs->ds;
> hh->es = regs->es;
> #endif /* CONFIG_X86_64 */
> diff --git a/arch/x86/mm/restart.c b/arch/x86/mm/restart.c
> index a10d63e..329f938 100644
> --- a/arch/x86/mm/restart.c
> +++ b/arch/x86/mm/restart.c
> @@ -111,15 +111,14 @@ static int cr_load_cpu_regs(struct cr_hdr_cpu *hh,
> struct task_struct *t)
> regs->cs = hh->cs;
> regs->flags = hh->flags;
> regs->sp = hh->sp;
> - write_pda(oldrsp, hh->sp);
> regs->ss = hh->ss;
>
> - thread->gs = hh->gs;
> - thread->fs = hh->fs;
> #ifdef CONFIG_X86_64
> do_arch_prctl(t, ARCH_SET_FS, hh->fs);
> do_arch_prctl(t, ARCH_SET_GS, hh->gs);
> #else
> + thread->gs = hh->gs;
> + thread->fs = hh->fs;
> loadsegment(gs, hh->gs);
> loadsegment(fs, hh->fs);
> #endif
>
>
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google Groups
> "kernel-live-migration" group.
> To post to this group, send email to kernel-live-migration at google.com
> To unsubscribe from this group, send email to
> kernel-live-migration+unsubscribe at google.com<kernel-live-migration%2Bunsubscribe at google.com>
> For more options, visit this group at
> http://groups.google.com/a/google.com/group/kernel-live-migration?hl=en
> -~----------~----~----~----~------~----~------~--~---
>
>
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list