[CRIU] crash in pb_read_one?
Tycho Andersen
tycho.andersen at canonical.com
Tue Sep 16 08:16:19 PDT 2014
Hi all,
While working on fixing the missing pid patchset, I'm experiencing a
crash in pb_read_one (a segfault). The patch that is causing it is:
https://github.com/tych0/criu/commit/a3f901f116d951ce369cc6355aafd4cecd398c79
(the branch is missing-pid)
I get the following output for one of the tests on that branch:
criu:/tmp/criu/test missing-pid 1 sudo ./zdtm.sh ns/static/session00
================================= CRIU CHECK =================================
Error (timerfd.c:56): timerfd: No timerfd support for c/r: Inappropriate ioctl for device
Error (cr-check.c:269): fdinfo doesn't contain the mnt_id field
============================= WARNING =============================
Not all features needed for CRIU are merged to upstream kernel yet,
so for now we maintain our own branch which can be cloned from:
git://git.kernel.org/pub/scm/linux/kernel/git/gorcunov/linux-cr.git
===================================================================
Execute zdtm/live/static/session00
./session00 --pidfile=session00.pid --outfile=session00.out
/tmp/criu/test
Dump 25642
Restore
cat: /proc/25686/maps: No such file or directory
Check results 25686
./zdtm.sh: line 462: kill: (25686) - No such process
Unable to stop session00 (25686)
Test: zdtm/live/static/session00, Result: FAIL
==================================== ERROR ====================================
Test: zdtm/live/static/session00, Namespace: 1
Dump log : /tmp/criu/test/dump/static/session00/25642/1/dump.log
--------------------------------- grep Error ---------------------------------
------------------------------------- END -------------------------------------
Restore log: /tmp/criu/test/dump/static/session00/25642/1/restore.log
--------------------------------- grep Error ---------------------------------
(00.044859) 1: Error (cr-restore.c:2155): before pb_read_one in 25686
(00.045951) 14: Error (cr-restore.c:2155): before pb_read_one in 25695
(00.045960) 14: Error (cr-restore.c:2157): after pb_read_one in 25695
(00.045979) 5: Error (cr-restore.c:2155): before pb_read_one in 25692
(00.045986) 5: Error (cr-restore.c:2157): after pb_read_one in 25692
(00.046197) 11: Error (cr-restore.c:2155): before pb_read_one in 25693
(00.046204) 11: Error (cr-restore.c:2157): after pb_read_one in 25693
(00.046591) 8: Error (cr-restore.c:2155): before pb_read_one in 25699
(00.046604) 8: Error (cr-restore.c:2157): after pb_read_one in 25699
(00.046843) 16: Error (cr-restore.c:2155): before pb_read_one in 25702
(00.046854) 16: Error (cr-restore.c:2157): after pb_read_one in 25702
(00.046875) 15: Error (cr-restore.c:2155): before pb_read_one in 25701
(00.046883) 15: Error (cr-restore.c:2157): after pb_read_one in 25701
(00.047400) 12: Error (cr-restore.c:2155): before pb_read_one in 25696
(00.047407) 12: Error (cr-restore.c:2157): after pb_read_one in 25696
(00.047722) 9: Error (cr-restore.c:2155): before pb_read_one in 25698
(00.047731) 9: Error (cr-restore.c:2157): after pb_read_one in 25698
(00.048236) 13: Error (cr-restore.c:2155): before pb_read_one in 25700
(00.048245) 13: Error (cr-restore.c:2157): after pb_read_one in 25700
(00.048584) 7: Error (cr-restore.c:2155): before pb_read_one in 25697
(00.048591) 7: Error (cr-restore.c:2157): after pb_read_one in 25697
(00.187746) Error (cr-restore.c:1162): 25686 killed by signal 11
(00.187815) Error (cr-restore.c:1787): Restoring FAILED.
------------------------------------- END -------------------------------------
================================= ERROR OVER =================================
It looks like the init task is crashing when reading its creds. Does anyone
have any idea why this might be?
Also, the reason I need to pass the child pids into the restorer blob
is because of shmem; we can't block after restore_fs(), because shmem
is restored inside the restore blob, and some calls in restore_fs
block until shmem is restored. We can't wait on the pstree helpers
until after restore_fs is done, so I think the best solution is to
just wait on them after the restore stage is over.
Any thoughts are much appreciated, I've been messing around with this
for a while and still haven't managed to get it right.
Tycho
More information about the CRIU
mailing list