[CRIU] Fwd: Checkpoint failure on arm64 platform

Vijay Kilari vijay.kilari at gmail.com
Mon Dec 21 22:03:41 PST 2015


On Mon, Dec 21, 2015 at 11:41 PM, Vijay Kilari <vijay.kilari at gmail.com> wrote:
> On Mon, Dec 21, 2015 at 6:47 PM, Pavel Emelyanov <xemul at parallels.com> wrote:
>>
>>> (00.106975) Error (parasite-syscall.c:815): Can't retrieve FD from socket
>>> pie: Daemon waits for command
>>> (00.106999) Wait for ack 15 on daemon socket
>>> (00.107036) Error (parasite-syscall.c:298): Message reply from daemon
>>> is trimmed (12/0)
>>> (00.107047) Error (cr-dump.c:1216): Can't get proc fd (pid: 1456)
>>> (00.107066) Waiting for 1456 to trap
>>> (00.107080) Daemon 1456 exited trapping
>>>
>>> In the kernel in readlinkat syscall, I have put printk to know the context
>>> in which /proc/self is read. It shows the same process id 1456 and
>>> name as 'tail'
>>> which is the process running inside container.
>>>
>>> [ 6461.973166] In readlinkat error < 0 -9 pid 1456 name tail
>>
>> Heh.. Would you dig this deeper to find out where the EBADF comes from?
>
> The parameters values received by readlink system call is wrong.
> Looks like there is mismatch in the system call codes. readlink() is calling
> readlinkat().
>
> Need to debug more and confirm it.

 sys_readlink has syscall number 78 which is actually syscall number of
sys_readlinkat. Because of mismatch in parameters, system call arguments
are corrupted and is failing.

Changes as below worked

pie/parasite.c

- ret = sys_readlink("/proc/self", buf, sizeof(buf));
+ret = sys_readlinkat(AT_FDCWD, "/proc/self", buf, sizeof(buf));

arch/arm/syscall.def

-readlink                         78      85       (const char *path,
char *buf, int bufsize)
+readlinkat                     78      85       (int fd, const char
*path, char *buf, int bufsize)

After this changes + changing PAGE_SIZE to 64KB, there is no error
reporting during checkpoint.
 However restore fails. Below are the steps I tried

ubuntu at ubuntu:~/criu/criu-1.8$ sudo docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED
         STATUS              PORTS               NAMES
ubuntu at ubuntu:~/criu/criu-1.8$ sudo docker run -d
justinzh/arm64-vivid:latest tail -f /dev/null
2ee07bc4adf3493473801651ce8f030a1fcc778bc79eae79fedb3afde39d7438
ubuntu at ubuntu:~/criu/criu-1.8$ sudo docker ps
CONTAINER ID        IMAGE                         COMMAND
 CREATED             STATUS              PORTS               NAMES
2ee07bc4adf3        justinzh/arm64-vivid:latest   "tail -f /dev/null"
 5 seconds ago       Up 4 seconds
adoring_kowalevski
ubuntu at ubuntu:~/criu/criu-1.8$ sudo docker checkpoint 2ee07bc4adf3
2ee07bc4adf3
ubuntu at ubuntu:~/criu/criu-1.8$
ubuntu at ubuntu:~/criu/criu-1.8$ sudo docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED
         STATUS              PORTS               NAMES
ubuntu at ubuntu:~/criu/criu-1.8$ sudo docker restore 2ee07bc4adf3
Error response from daemon: Cannot restore container 2ee07bc4adf3:
cantstart: Cannot start container
2ee07bc4adf3493473801651ce8f030a1fcc778bc79eae79fedb3afde39d7438: criu
failed: type NOTIFY errno 0
log file: /var/lib/docker/0.0/containers/2ee07bc4adf3493473801651ce8f030a1fcc778bc79eae79fedb3afde39d7438/criu.work/restore.log
Error: failed to restore one or more containers
ubuntu at ubuntu:~/criu/criu-1.8$

Questions

1) After checkpoint, 'docker ps' does not show up checkpointed
docker?. Is this expected behaviour?
2) Restore fails at clone() call.

restore.log
--------------
(00.006447) Warn  (cr-restore.c:1075): Set CLONE_PARENT | CLONE_NEWPID
but it might cause restore problem,because not all kernels support
such clone flags combinations!
(00.006469) Forking task with 1 pid (flags 0x6c028000)
(00.168693) Error (cr-restore.c:1175): Can't fork for 1: Invalid argument
(00.236429) Error (cr-restore.c:1995): Restoring FAILED.


>
>>
>>> zdtm.sh shows dump is successful where as re-store is failing with
>>> clone syscall.
>>> Looks like zdtm.sh is not testing /proc/self.
>>>
>>> ubuntu at ubuntu:~/criu/criu-1.8$ sudo ./test/zdtm.sh
>>
>> There can be an issue with sudo. Try to go to "fair" root with
>>
>> $ sudo su -
>>
>> then starting the zdtm.sh.
>>
>   outcome is same
>
>>> ================================= CRIU CHECK =================================
>>> Error (cr-check.c:634): Kernel doesn't support PTRACE_O_SUSPEND_SECCOMP
>>> Error (cr-check.c:572): read: Invalid argument
>>> Error (cr-check.c:826): CLONE_PARENT | CLONE_NEWPID don't work together
>>> ============================= WARNING =============================
>>> Not all features needed for CRIU are merged to upstream kernel yet,
>>> so for now we maintain our own branch which can be cloned from:
>>> git://git.kernel.org/pub/scm/linux/kernel/git/gorcunov/linux-cr.git
>>> ===================================================================
>>> Execute static/pipe00
>>> ./pipe00 --pidfile=pipe00.pid --outfile=pipe00.out
>>> Dump 5319
>>> Restore
>>> Test: zdtm/live/static/pipe00, Result: FAIL
>>> ==================================== ERROR ====================================
>>> Test: zdtm/live/static/pipe00, Namespace:
>>> Dump log   : /home/ubuntu/criu/criu-1.8/test/dump/static/pipe00/5319/1/dump.log
>>> --------------------------------- grep Error ---------------------------------
>>> ------------------------------------- END -------------------------------------
>>> Restore log: /home/ubuntu/criu/criu-1.8/test/dump/static/pipe00/5319/1/restore.log
>>> --------------------------------- grep Error ---------------------------------
>>> (00.012996) Error (cr-restore.c:1175): Can't fork for 5319: Invalid argument
>>> (00.013053) Error (cr-restore.c:1995): Restoring FAILED.
>>> ------------------------------------- END -------------------------------------
>>> ================================= ERROR OVER =================================
>>> ubuntu at ubuntu:~/criu/criu-1.8$ criu --version
>>> Version: 1.8
>>>
>>>>
>>>> -- Pavel
>>> .
>>>
>>


More information about the CRIU mailing list