[CRIU] [PATCH 0/4] restore: Fix potential hung on restore

Cyrill Gorcunov gorcunov at gmail.com
Fri Dec 7 14:57:08 MSK 2018


Occasionally we've got a hunging container which being restored
without problems but was unable to finish restore because we've
been spinning on nr_in_progress = 1 forewer. Sadly I'm unable to
recreate this situation locally (it triggers first time for the
whole lifetime of the project). Thus ideas are welcome.

  | 220538 ?        Ss     0:00              \_ init -z
  | 222830 ?        Ss     0:00                  \_ /usr/sbin/xinetd -stayalive -pidfile /var/run/xinetd.pid
  | 222831 ?        Ss     0:00                  \_ /sbin/agetty --noclear tty2 linux
  | 222832 ?        Ss     0:00                  \_ /usr/sbin/httpd -DFOREGROUND
  | 222833 ?        Ss     0:00                  \_ /usr/sbin/saslauthd -m /run/saslauthd -a pam -n 2
  | 226970 ?        S      0:00                  |   \_ /usr/sbin/saslauthd -m /run/saslauthd -a pam -n 2
  | 222834 ?        Ss     0:00                  \_ /usr/sbin/crond -n
  | 222836 ?        Zs     0:00                  \_ [firewalld] <defunct>
  | 222837 ?        Ss     0:00                  \_ /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
  | 222839 ?        Ss     0:00                  \_ /usr/lib/systemd/systemd-logind
  | 222840 ?        Z      0:00                  \_ [firewalld] <defunct>
  | 222841 ?        Ss     0:00                  \_ /usr/lib/systemd/systemd-udevd
  | 222842 ?        Ss     0:00                  \_ /usr/lib/systemd/systemd-journald

  | (00.451549)     84:     expecting helper child 301 to exit
  | (00.451557)     84: zombie: gonna wait for children to exit
  | (00.451556)    301:     expecting zombie child 91 to exit
  | (00.451565)    301: helper: gonna wait for children to exit

Cyrill Gorcunov (4):
  restore: zombie -- Add more detailed log on signals
  restore: Add more detailed log in wait_exiting_children
  restore: Don't ignore errors on wait in restore_one_zombie
  restore: Fix hang if root task is waiting on zombie

 criu/cr-restore.c   | 17 +++++++++++++----
 criu/pie/restorer.c | 36 +++++++++++++++++++++++++++++++-----
 2 files changed, 44 insertions(+), 9 deletions(-)

-- 
2.17.2



More information about the CRIU mailing list