[CRIU] PID mismatch problem

Federico Reghenzani federico1.reghenzani at mail.polimi.it
Thu Dec 17 04:22:37 PST 2015


Hi Pavel,
thank you for your answer. I notice today that "sometimes" it works, so the
problem is intermittent.

2015-12-17 12:38 GMT+01:00 Pavel Emelyanov <xemul at parallels.com>:

> On 12/16/2015 12:35 PM, Federico Reghenzani wrote:
> > Hi all,
> >
> > I've a strange problem trying to restore an image.
> >
> > Trying to execute "criu restore -D directory" with the same image I get
> different error
> > messages. In particular sometimes it tells me:
> >
> >     /  5485: Error (cr-restore.c:1499): Pid 5488 do not match expected
> 5487/
>
> This makes that while you were restoring the tree some tasks were
> fork()-ing in parallel thus occupying the pids that were supposed
> to be assigned to your tasks.
>
> To fix this reliably you should either c/r a container or use the
> --unshare option from patches
> https://lists.openvz.org/pipermail/criu/2015-December/023995.html
>
> >     /  5485: Error (cr-restore.c:1262): 5488 exited, status=1/
> >     /Killed/
> >
> >
> > other times tells me:
> >
> >     /  5485: Error (tty.c:531): tty: Unable to open dev/ptmx with
> specified index 0/
> >     /  5485: Error (tty.c:917): tty: Can't open a (index 0): Bad file
> descriptor/
>
> Did you use the -j option on dump? If so, then it's likely the lack of
> same -j option for restore.
>

I'm neither using -j for dump nor for restore. I tried also adding that but
it seems it changes nothing. (I'm using the C API, so
the criu_set_shell_job)


>
> >     /Error (files-reg.c:445):  `- XFail
> [/dev/shm/open_mpi.0000.cr.1.ghost] ghost: No such file or directory/
>
> MPI? Are you trying to C/R mpi jobs?
>

Yes, we are trying to add to Open MPI the capability to migrate orted
daemons between nodes. Currently we do not checkpoint the single mpi
process, but the entire daemon with its children.


>
> >     /Error (cr-restore.c:1995): Restoring FAILED./
> >
> >
> > Note that in the first case I have no active process with that PID, and
> all other processes have PID under 1000.
>
> Hm... If it's so, can you strace the restore with -f option so that we
> could check where the "bad" process comes from?
>
>
I'll try this option and the --unshare option in next days (probably on
Monday), and I let you know.


> > CRIU version: 1.8
> >
> > criu check output:
> >
> >
> >     /Error (cr-check.c:634): Kernel doesn't support
> PTRACE_O_SUSPEND_SECCOMP/
> >     /Error (cr-check.c:683): Dumping seccomp filters not supported:
> Input/output error/
> >     /Warn  (cr-check.c:696): Dirty tracking is OFF. Memory snapshot will
> not work./
> >
> >
> >
> > Thank you in advance, any help would be appreciated.
> >
> > Cheers,
> > Federico
> >
> > __
> > Federico Reghenzani
> > M.Eng. Student @ Politecnico di Milano
> > Computer Science and Engineering
> >
> >
> >
> >
> > _______________________________________________
> > CRIU mailing list
> > CRIU at openvz.org
> > https://lists.openvz.org/mailman/listinfo/criu
> >
>
>
Thanks again,

Cheers,
Federico
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20151217/01aa1b6d/attachment-0001.html>


More information about the CRIU mailing list