[Devel] Re: [PATCH RFC] s390: let tasks know to restart syscalls after sys_restart()
Oren Laadan
orenl at cs.columbia.edu
Tue Feb 9 13:12:11 PST 2010
Serge E. Hallyn wrote:
> Quoting Oren Laadan (orenl at cs.columbia.edu):
>>
>> Serge E. Hallyn wrote:
>>> Quoting Oren Laadan (orenl at cs.columbia.edu):
>>>> Serge E. Hallyn wrote:
>>>>> (This is a patch against the checkpoint/restart kernel tree at
>>>>> http://git.ncl.cs.columbia.edu/?p=linux-cr.git;a=shortlog;h=refs/heads/ckpt-v19-rc2.9)
>>>>>
>>>>> On x86, do_signal() leaves -516 in eax while it freezes, which
>>>>> sys_restart() can use to detect that it should restart the
>>>>> syscall which was interrupted by a signal (or the freezer).
>>>>>
>>>>> On s390, gprs[2] gets tweaked to -EINTR (-4) instead, leaving
>>>>> us no reliable way to tell whether should be restarted. If the
>>>>> task is checkpointed here and then restarted, then the last part
>>>>> of do_signal() won't be done, and the interrupted syscall won't
>>>>> be restarted.
>>>>>
>>>>> This patch defines TIF_RESTARTBLOCK as a thread flag showing that
>>>>> the thread expects to be frozen while kicked out of a restartable
>>>>> syscall by a signal.
>>>>>
>>>>> The TIF_RESTARTBLOCK flag is only set for the duration of the
>>>>> get get_signal_to_deliver() which is where the task may get
>>>>> frozen. We also set it in sys_restart() if the checkpointed task
>>>>> had had TIF_RESTARTBLOCK set. It will get cleared if upon exiting
>>>>> sys_restart() we jump to sysc_sigpending. If instead we jump back
>>>>> to do_signal(), then TIF_RESTARTBLOCK will stay set again until
>>>>> after get_signal_to_deliver() so that if it immediately freezes and
>>>>> is re-checkpointed, the resulting second checkpoint image again
>>>>> will have TIF_RESTARTBLOCK set.
>>>> Two comments:
>>>>
>>>> 1) This note will be lost once we fold this patch into a clean
>>>> patchset. Can you please make it part of the code ?
>>> Sure, good idea.
>>>
>>>> 2) Maybe I'm missing something, but I'm not convinced. Can you
>>>> elaborate on why this is correct in different cases ? Also, in
>>>> particular with respect to the post-signal-sent snippet in the
>>>> signal handling code:
>>>>
>>>> if (retval == -ERESTART_RESTARTBLOCK
>>>> && regs->psw.addr == continue_addr) {
>>>>
>>>> regs->gprs[2] = __NR_restart_syscall;
>>>>
>>>> set_thread_flag(TIF_RESTART_SVC);
>>>>
>>>> }
>>>>
>>>>
>>>> Would it do what you expect after a restart ? (restart modifies
>>>> the psw.addr)
>>> I don't understand the question. After sys_restart(), we won't be
>>> returning to this kernel code. We'll either immediately call
>>> restart_syscall(), or, if a signal was delivered before sys_restart(),
>>> completed, then do_signal() will start again from the top.
>> Ok, I re-read the code: let's look at these cases:
>>
>> case 1: checkpointee wasn't in syscall -- no problem.
>>
>> case 2: checkpointee was in syscall, no signal pending; when it was
>> frozen, regs->svcnr became 0, and that's what we save, so on restart
>> we won't enter that snippet again. Again, no problem.
>>
>> case 3: checkpointee was in syscall, signal pending;
>> case 4: checkpointee was in syscall, signal received at restart;
>> look at this snippet:
>>
>> if (signr > 0 && regs->psw.addr == restart_addr) {
>> if (retval == -ERESTARTNOHAND
>> || (retval == -ERESTARTSYS
>> && !(current->sighand->action[signr-1].sa.sa_flags
>> & SA_RESTART))) {
>> regs->gprs[2] = -EINTR;
>> regs->psw.addr = continue_addr;
>> }
>> }
>>
>> Because svcnr is/was 0, neither restart_addr nor continue_addr
>> were setup, so this condition is always false, which I think is
>> wrong.
>
> I've been focusing on the ERESTART_RESTARTBLOCK case. Can
> we agree that all cases appear to be handled correctly there?
Errr... everytime I need to go and read all the code from
scratch to convince myself... and I my mind just context
switched :(
>
> For the ERESTARTSYS/ERESTARTNOHAND case, I'm probably not
> doing the right thing. For a single checkpoint, since either
> there was no real signal (freezer) or it didn't get handled
> before checkpoint, psw.addr gets checkpointed and restored
> as restart_addr, which is the right thing. (since signr is
> not >0, we would have kept the values the same after
> get_signal_to_deliver()).
>
> But if a real signal gets delivered upon exit of sys_restart(),
> then I think I do think we'll end up doing the wrong thing -
> we'll restart the interrupted system call with the orig_gpr2,
> so we'll pretend the signal did not get delivered, rather
> than proceed past the call to the system call (in userspace)
> with return value -EINTR. (Just how wrong is that?)
>
> This is all dense enough that it may be worth thinking of
> a different way to handle it, but I'm not sure what that
> way would be. The challenge is finding a *simple*, reliable
> way to detect what the the initial conditions to do_signal()
> where, based on the register/thread_info values as they are
> at do_signal()->get_signal_to_deliver()->try_to_freeze(),
> given the ways the values get swapped in the block above
> the get_signal_to_deliver() call.
>
> The simplest thing by far would be if we could safely
> move the get_signal_to_deliver() call before the
>
> if (regs->svcnr) {
> continue_addr = regs->psw.addr;
> ...
>
> block. I assume there are entry_64.S-related reasons why
> we cannot?
Yes, that's what I was thinking. And I don't know enough of
s390 to understand why we could not.
Either that or record the state before it is modified in
dedicated checkpoint-related fields (e.g. per thread).
Oren.
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list