[CRIU] [PATCH] s390: Prevent GOT relocations
Michael Holzheu
holzheu at linux.vnet.ibm.com
Wed Jul 19 12:00:15 MSK 2017
Am Tue, 18 Jul 2017 17:22:20 +0200
schrieb Adrian Reber <areber at redhat.com>:
> On Mon, Jul 17, 2017 at 08:44:30PM +0200, Michael Holzheu wrote:
> > Am Mon, 17 Jul 2017 19:45:36 +0200
> > schrieb Adrian Reber <areber at redhat.com>:
> >
> > > On Mon, Jul 17, 2017 at 07:21:21PM +0200, Michael Holzheu wrote:
> > > > Am Mon, 17 Jul 2017 10:07:23 +0200
> > > > schrieb Adrian Reber <areber at redhat.com>:
> > > >
> > > > > On Fri, Jul 14, 2017 at 02:56:26PM +0200, Michael Holzheu wrote:
> > > > > > Am Fri, 14 Jul 2017 14:08:31 +0200
> > > > > > schrieb Adrian Reber <areber at redhat.com>:
> > > > > >
> > > > > > > Thanks for the patch. I tried it on my s390 test system and I get the
> > > > > > > following error now:
> > > > > >
> > > > > > Ok, fine - at least we don't see the compiler error any more.
> > > > > >
> > > > > > >
> > > > > > > (00.002625) f15 0000000000000000
> > > > > > > (00.002626) No VXRS
> > > > > > > (00.002628) Putting tsock into pid 24
> > > > > > > (00.002639) ptrace_set_regs: pid=24
> > > > > > > (00.002656) Error (compel/src/lib/infect.c:633): Unable to connect a transport socket: Function not implemented
> > > > > > > (00.002665) Error (compel/src/lib/infect.c:559): Can't inject syscall blob (pid: 24)
> > > > > > > (00.002667) Error (compel/src/lib/infect.c:1312): munmap for remote map 0x3fffd5c5000, 53248 returned 4398002229248
> > > > > > > (00.002669) Error (criu/cr-dump.c:1362): Can't infect (pid: 24) with parasite
> > > > > > > (00.002720) Unlock network
> > > > > > > (00.002735) Unfreezing tasks into 1
> > > > > > > (00.002737) Unseizing 24 into 1
> > > > > > > (00.002740) Error (compel/src/lib/infect.c:341): Unable to detach from 24: No such process
> > > > > > > (00.002745) Unseizing 25 into 1
> > > > > > > (00.002754) Error (criu/cr-dump.c:1800): Dumping FAILED.
> > > > > >
> > > > > > I think the problem is not related to the patch.
> > > > > > Could you send me the full log?
> > > > >
> > > > > https://lisas.de/~adrian/dump-log.s390
> > > >
> > > > I assume that the target process "somehow" dies very early when
> > > > the parasite code is started:
> > > >
> > > > compel/src/lib/infect.c:
> > > >
> > > > 627 if (parasite_run(pid, PTRACE_CONT, ctl->parasite_ip, ctl->rstack, ®s, &ctl->orig))
> > > > 628 goto err;
> > > >
> > > > Here the __export_parasite_head_start() function is executed in the target
> > > > process. This function then calls the parasite_service() function with
> > > > the PARASITE_CMD_INIT_DAEMON command:
> > > >
> > > > compel/arch/s390/plugins/std/parasite-head.S:
> > > >
> > > > ENTRY(__export_parasite_head_start)
> > > > larl %r14,__export_parasite_cmd
> > > > llgf %r2,0(%r14)
> > > > larl %r3,__export_parasite_args
> > > > brasl %r14,parasite_service
> > > > .long 0x00010001 /* S390_BREAKPOINT_U16: Generates SIGTRAP */
> > > > __export_parasite_cmd:
> > > > .long 0
> > > >
> > > > Perhaps you could manually try the following:
> > > >
> > > > 1) Run sleep program:
> > > >
> > > > # ulimit -c unlimited
> > > > # sleep 10000
> > > > [1] 8532
> > > >
> > > > 2) Checkpoint program
> > > >
> > > > # mkdir ~/dump
> > > > # criu/criu dump -t 8532 --shell-job -D ~/dump
> > > >
> > > > 3) Check if we got a core dump for the sleep process
> > >
> > > No, I still get same error as before:
> > >
> > > (00.056660) Error (compel/src/lib/infect.c:633): Unable to connect a transport socket: Function not implemented
> > > (00.056707) Error (compel/src/lib/infect.c:559): Can't inject syscall blob (pid: 1851)
> > > (00.056721) Error (compel/src/lib/infect.c:1312): munmap for remote map 0x3fff6f49000, 466944 returned 4397894766592
> > > (00.056733) Error (criu/cr-dump.c:1362): Can't infect (pid: 1851) with parasite
> > >
> > >
> > > > Unfortunately I currently can't reproduce this on my RHEL7.4 kernel 3.10.0-685.el7.s390x
> > > > because of a different problem:
> > > >
> > > > ~/criu # criu/criu dump -t 14545 --shell-job -D ~/dump/
> > > > Error (criu/proc_parse.c:2654): Can't open 14545/task/14545/children on procfs: No such file or directory
> > > > Error (criu/cr-dump.c:1800): Dumping FAILED.
> > > >
> > > > ~/criu # ls /proc/14545/task/14545/children
> > > > ls: cannot access /proc/14545/task/14545/children: No such file or directory
> > > >
> > > > I assmue the problem is that my kernel has not enabled CONFIG_CHECKPOINT_RESTORE.
> > >
> > > I can provide you a test kernel (off-list). There is always the chance
> > > that I am still missing important patches in my kernel.
> >
> > After enabling the memfd_create() syscall I found the following
> > spot that failed in compel/plugins/std/infect.c:
> >
> > 138 static noinline __used int parasite_init_daemon(void *data)
> > 139 {
> > 140 struct parasite_init_args *args = data;
> > 141 int ret;
> > 142
> > 143 args->sigreturn_addr = (uint64_t)(uintptr_t)fini_sigreturn;
> > 144 sigframe = (void*)(uintptr_t)args->sigframe;
> > 145
> > 146 ret = tsock = sys_socket(PF_UNIX, SOCK_SEQPACKET, 0);
> >
> > Here we get ret = ENOSYS (-38)
> >
> > 147 if (tsock < 0) {
> > 148 pr_err("Can't create socket: %d\n", tsock);
> > 149 goto err;
> > 150 }
> > ...
> > 172 err:
> > 173 futex_set_and_wake(&args->daemon_connected, ret);
> >
> > Here we set daemon_connected = -38 ...
> >
> > 174 fini();
> > 175 BUG();
> > 176
> > 177 return -1;
> >
> >
> > ... which fits to the error message created in compel/src/lib/infect.c:
> >
> > 631 if (futex_get(&args->daemon_connected) != 1) {
> > 632 errno = -(int)futex_get(&args->daemon_connected);
> >
> > Here we set errno = 38
> >
> > 633 pr_perror("Unable to connect a transport socket");
> > 634 goto err;
> > 635 }
> >
> > So looks like you have to wire sys_socket() for RHEL7?
>
> Thanks for helping to figure this out. It was not only sys_socket().
> Basically all syscalls from
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=977108f89c989b1eeb5c8d938e1e71913391eb5f
>
> which were present in
> compel/arch/s390/plugins/std/syscalls/syscall-s390.tbl had to be added
> to the kernel. Now zdtm seems to be quite happy:
That's good news.
> $ ./zdtm.py run -a -f h --keep-going -x zdtm/static/s390x_mmap_high
> [...]
>
> ################### 1 TEST(S) FAILED (TOTAL 315/SKIPPED 115) ###################
> * zdtm/static/stopped(h)
> ##################################### FAIL #####################################
>
> I have to excldue zdtm/static/s390x_mmap_high as that test case just
> hangs. zdtm says ==== ALARM ==== and then a process list.
I assume you have included already applied the kernel patch
ee71d16d22 ("s390/mm: make TASK_SIZE independent from the number
of page table levels").
But it should not hang without the patch either - so we have to look into
this issue.
Michael
More information about the CRIU
mailing list