[Devel] Re: [PATCH 1/3] powerpc: bare minimum checkpoint/restart implementation

Thu Mar 12 20:36:49 PDT 2009

Nathan Lynch wrote:
> On Tue, 24 Feb 2009 13:58:26 -0600
> "Serge E. Hallyn" <serue at us.ibm.com> wrote:
> 
>> Quoting Nathan Lynch (ntl at pobox.com):
>>> Nathan Lynch <ntl at pobox.com> wrote:
>>>> Oren Laadan wrote:
>>>>> Nathan Lynch wrote:
>>>>>> What doesn't work:
>>>>>> * restarting a 32-bit task from a 64-bit task and vice versa
>>>>> Is there a test to bail if we attempt to checkpoint such tasks ?
>>>> No, but I'll add one if it looks too hard to fix for the next round.
>>> Unfortunately, adding a check for this is hard.
>>>
>>> The "point of no return" in the restart path is cr_read_mm, which tears
>>> down current's address space.  cr_read_mm runs way before cr_read_cpu,
>>> which is the only restart method I've implemented for powerpc so far.
>>> So, checking for this condition in cr_read_cpu is too late if I want
>>> restart(2) to return an error and leave the caller's memory map
>>> intact.  (And I do want this: restart should be as robust as execve.)
>>>
>>> Well okay then, cr_read_head_arch seems to be the right place in the
>>> restart sequence for the architecture code to handle this.  However,
>>> cr_write_head_arch (which produces the buffer that cr_read_head_arch
>>> consumes) is not provided a reference to the task to be checkpointed,
>>> nor can it assume that it's operating on current.  I need a reference
>>> to a task before I can determine whether it's running in 32- or 64-bit
>>> mode, or using the FPU, Altivec, SPE, whatever.
>>>
>>> In any case, mixing 32- and 64-bit tasks across restart is something I
>>> eventually want to support, not reject.  But the problem I've outlined
>>> applies to FPU state and vector extensions (VMX, SPE), as well as
>>> sanity-checking debug register (DABR) contents.  We'll need to be able
>>> to error out gracefully from restart when a checkpoint image specifies a
>>> feature unsupported by the current kernel or hardware.  But I don't see
>>> how to do it with the current architecture.  Am I missing something?
>> I suspect I can guess the response to this suggestion, but how about we
>> accept that if sys_restart() fails due to something like this, the
>> task is lost and can't exit gracefully?
> 
> In the short term it might be necessary.  But the restart code should
> forcibly kill the task instead of returning an error back up to
> userspace in this case.  Once the memory map of the process has been
> altered, there is no point in allowing it to continue (and likely dump
> a useless core).  Btw, this failure mode seems to apply when
> cr_read_files() fails, too...
> 
> But in the long term, things need to be more robust (e.g. restart(2)
> returns ENOEXEC without messing with current->mm).  I think it's worth
> looking at how execve operates... if I understand correctly, it sets up
> a new mm_struct disconnected from the current task and activates it at
> the last moment.
> 

That's a good idea, and I have considered it in the past.

However, it is easier to restarti a task in its own, new, context,
including the MM. For instance, you can leverage all memory syscalls.

An in-between way would be to switch to the new MM but not tear down
the original one, but rather save it along side. If a failure occur -
restore it.

Then, you'll have to ask the same question about all other resources -
signal handlers, open files, etc. Either you make all changes atomic
at once, or none - if you want the operation to be non-intrusive in
the case of an error.

However, I do think that this is not necessary: the tasks that are
doing the restart have been created from scratch for that purpose,
so they need not return any specific value to the user. It is the
task that initiates the restart that needs to handle error gracefully.
The scheme I proposed in the previous email does exactly that.

(This does not apply to self-restart, for obvious reasons, but that
is a special case anyway).

Oren.

_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers