[CRIU] [PATCH 0/4] restore: Fix potential hung on restore
Cyrill Gorcunov
gorcunov at gmail.com
Fri Dec 7 14:57:08 MSK 2018
Occasionally we've got a hunging container which being restored
without problems but was unable to finish restore because we've
been spinning on nr_in_progress = 1 forewer. Sadly I'm unable to
recreate this situation locally (it triggers first time for the
whole lifetime of the project). Thus ideas are welcome.
| 220538 ? Ss 0:00 \_ init -z
| 222830 ? Ss 0:00 \_ /usr/sbin/xinetd -stayalive -pidfile /var/run/xinetd.pid
| 222831 ? Ss 0:00 \_ /sbin/agetty --noclear tty2 linux
| 222832 ? Ss 0:00 \_ /usr/sbin/httpd -DFOREGROUND
| 222833 ? Ss 0:00 \_ /usr/sbin/saslauthd -m /run/saslauthd -a pam -n 2
| 226970 ? S 0:00 | \_ /usr/sbin/saslauthd -m /run/saslauthd -a pam -n 2
| 222834 ? Ss 0:00 \_ /usr/sbin/crond -n
| 222836 ? Zs 0:00 \_ [firewalld] <defunct>
| 222837 ? Ss 0:00 \_ /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
| 222839 ? Ss 0:00 \_ /usr/lib/systemd/systemd-logind
| 222840 ? Z 0:00 \_ [firewalld] <defunct>
| 222841 ? Ss 0:00 \_ /usr/lib/systemd/systemd-udevd
| 222842 ? Ss 0:00 \_ /usr/lib/systemd/systemd-journald
| (00.451549) 84: expecting helper child 301 to exit
| (00.451557) 84: zombie: gonna wait for children to exit
| (00.451556) 301: expecting zombie child 91 to exit
| (00.451565) 301: helper: gonna wait for children to exit
Cyrill Gorcunov (4):
restore: zombie -- Add more detailed log on signals
restore: Add more detailed log in wait_exiting_children
restore: Don't ignore errors on wait in restore_one_zombie
restore: Fix hang if root task is waiting on zombie
criu/cr-restore.c | 17 +++++++++++++----
criu/pie/restorer.c | 36 +++++++++++++++++++++++++++++++-----
2 files changed, 44 insertions(+), 9 deletions(-)
--
2.17.2
More information about the CRIU
mailing list