[CRIU] few remaining s390 zdtm failures (for me)

Adrian Reber adrian at lisas.de
Mon Aug 14 12:01:53 MSK 2017


On Thu, Aug 10, 2017 at 04:14:35PM +0300, Pavel Emelyanov wrote:
> On 08/10/2017 03:09 PM, Adrian Reber wrote:
> > On Thu, Aug 10, 2017 at 02:37:55PM +0300, Pavel Emelyanov wrote:
> >> On 08/09/2017 09:32 PM, Adrian Reber wrote:
> >>> On Wed, Aug 09, 2017 at 07:11:45PM +0300, Pavel Emelyanov wrote:
> >>>> On 08/08/2017 02:40 PM, Adrian Reber wrote:
> >>>>> Running on RHEL s390x I have few zdtm test cases failing which do not
> >>>>> fail on x86_64 and ppc64le.
> >>>>>
> >>>>>   * zdtm/static/vdso02(ns)
> >>>>>   * zdtm/static/stopped(h)
> >>>>>   * zdtm/static/stopped01(h)
> >>>>>   * zdtm/static/stopped02(h)
> >>>>>   * zdtm/static/stopped12(ns)
> >>>>>
> >>>>> Error messages are all the same like:
> >>>>>
> >>>>> (01.503563) cg: Set 1 is criu one
> >>>>> (01.503686) Seized task 24, state 1
> >>>>> (01.503715) Collected (4 attempts, 0 in_progress)
> >>>>> (01.503728) Seized task 25, state 0
> >>>>> (01.503912) Error (compel/src/lib/infect.c:180): SEIZE 25: task not stopped after seize
> >>>>
> >>>> Can you put task_is_trapped()-like messages in this place so that we could
> >>>> see what's wrong with this task?
> >>>
> >>> Like this?
> >>>
> >>> (00.034673) Error (compel/src/lib/infect.c:1446): Task 25 is in unexpected state: 200
> >>> (00.034675) Error (compel/src/lib/infect.c:1448): Task exited with 2
> >>
> >> Looks like yes, but ... hm ... exited. Which exact test is that? Or any of the above
> >> behaves like that?
> > 
> > Console output: https://lisas.de/~adrian/s390-zdtm-failures
> 
> So it looks like child process just wakes up and exits. AFAIU this is behavior on
> some older kernels, while on more recent it just works. I'd suggest looking for some
> kernel patches that fix/tune/affect behavior of task stopped/trapped states for s390.
> 
> You say this is 3.10.x, right?

This error is fixed since 4.2 and with Oleg's help we narrowed it down
to:

commit e7cc4173115347bcdaa5de2824dd46ef2c58425f
Author: Palmer Dabbelt <palmer at dabbelt.com>
Date:   Thu Apr 30 21:19:55 2015 -0700

    signals, ptrace, sched: Fix a misaligned load inside ptrace_attach()

    [...]

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4f066cb625ad..fb650a2f4a73 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1374,7 +1374,7 @@ struct task_struct {
        int exit_state;
        int exit_code, exit_signal;
        int pdeath_signal;  /*  The signal sent when the parent dies  */
-       unsigned int jobctl;    /* JOBCTL_*, siglock protected */
+       unsigned long jobctl;   /* JOBCTL_*, siglock protected */
 
        /* Used for emulating ABI behavior of previous Linux versions */
        unsigned int personality;


I have now no more errors on s390 with my 3.10.x kernel. Thanks everyone!

		Adrian


More information about the CRIU mailing list