[CRIU] lxc-checkpoint restore failed

Jason Lee ldm5235 at gmail.com
Wed Oct 14 20:12:10 PDT 2015


OK! I have applied this patch. The error message is this:

Warn  (cr-restore.c:1047): Set CLONE_PARENT | CLONE_NEWPID but it might
cause restore problem,because not all kernels support such clone flags
combinations!
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
RTNETLINK answers: File exists
     1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c4ff
peer 0 (name /run/systemd/notify dir -)
: No such file or directory
     1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c501
peer 0 (name /run/systemd/private dir -)
: No such file or directory
     1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c50b
peer 0 (name /run/systemd/shutdownd dir -)
: No such file or directory
     1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c50d
peer 0 (name /run/systemd/journal/dev-log dir -)
: No such file or directory
     1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c511
peer 0 (name /run/systemd/journal/stdout dir -)
: No such file or directory
     1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c513
peer 0 (name /run/systemd/journal/socket dir -)
: No such file or directory
     1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2b88d
peer 0x28bc5 (name /run/systemd/journal/stdout dir -)
: No such file or directory
     1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c289
peer 0x2abaf (name /run/systemd/journal/stdout dir -)
: No such file or directory
     1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c28c
peer 0x2b4cf (name /run/systemd/journal/stdout dir -)
: No such file or directory
     1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c28d
peer 0x2b33d (name /run/systemd/journal/stdout dir -)
: No such file or directory
     1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c28a
peer 0x2c8fa (name /run/systemd/journal/stdout dir -)
: No such file or directory
     1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c28b
peer 0x2b902 (name /run/systemd/journal/stdout dir -)
: No such file or directory
    68: Error (sk-packet.c:419): Can't bind packet socket: Invalid argument
Error (cr-restore.c:1235): 28680 killed by signal 19
Error (cr-restore.c:1235): 28680 killed by signal 19
Error (cr-restore.c:1959): Restoring FAILED.

BTW my network config is:
br0       Link encap:Ethernet  HWaddr 40:f2:e9:d2:81:38
          inet addr:xxx  Bcast:xxx  Mask:255.255.255.0
          inet6 addr: xxx Scope:Global
          inet6 addr: xxx Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1472399 errors:0 dropped:0 overruns:0 frame:0
          TX packets:30432 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:115725262 (110.3 MiB)  TX bytes:4463645 (4.2 MiB)

eth0      Link encap:Ethernet  HWaddr 40:f2:e9:d2:81:38
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1642533 errors:0 dropped:1196 overruns:0 frame:0
          TX packets:100308 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:352311619 (335.9 MiB)  TX bytes:9232447 (8.8 MiB)
          Memory:90580000-9059ffff

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)



2015-10-15 2:54 GMT+08:00 Tycho Andersen <tycho.andersen at canonical.com>:

> Hi Jason, Pavel,
>
> On Wed, Oct 14, 2015 at 03:03:37PM +0300, Pavel Emelyanov wrote:
> > Adding Tycho (an LXC guy) to the discussion.
> >
> > On 10/14/2015 06:56 AM, Jason Lee wrote:
> > > Hi all!
> > > Recently I use lxc-checkpoint to c/r linux container.When dumping
> criu,It's no
> > > problem.but I use lxc-checkpoint -r to restore one lxc. It's failed!
> > > BTW My host os is debian 8 .Here is my enviorment:
> > >
> > > lxc.rootfs = /usr/local/var/lib/lxc/d1/rootfs
> > > lxc.include = /usr/local/share/lxc/config/debian.common.conf
> > > lxc.utsname = d1
> > > lxc.arch = amd64
> > > lxc.tty = 0
> > > lxc.pts = 1
> > > lxc.console = none
> > >
> > > #lxc.cap.drop = sys_module mac_admin mac_override sys_time
> > > lxc.cgroup.devices.deny = c 5:1 rwm
> > > lxc.aa_allow_incomplete = 1
> > > lxc.network.type = veth
> > > lxc.network.flags = up
> > > # that's the interface defined above in host's interfaces file
> > > lxc.network.link = br0
> > > # name of network device inside the container,
> > > # defaults to eth0, you could choose a name freely
> > > # lxc.network.name <http://lxc.network.name> = lxcnet0
> > > lxc.network.hwaddr = 00:16:3e:d2:29:be
> > >
> > > mount point:
> > > root at dslab:/home# mount
> > > sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
> > > proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
> > > udev on /dev type devtmpfs
> (rw,relatime,size=10240k,nr_inodes=1002688,mode=755)
> > > devpts on /dev/pts type devpts
> (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
> > > tmpfs on /run type tmpfs (rw,nosuid,relatime,size=1607656k,mode=755)
> > > /dev/sda6 on / type ext4 (rw,relatime,errors=remount-ro,data=ordered)
> > > securityfs on /sys/kernel/security type securityfs
> (rw,nosuid,nodev,noexec,relatime)
> > > tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
> > > tmpfs on /run/lock type tmpfs
> (rw,nosuid,nodev,noexec,relatime,size=5120k)
> > > tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
> > > cgroup on /sys/fs/cgroup/systemd type cgroup
> (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
> > > pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
> > > cgroup on /sys/fs/cgroup/cpuset type cgroup
> (rw,nosuid,nodev,noexec,relatime,cpuset,clone_children)
> > > cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup
> (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
> > > cgroup on /sys/fs/cgroup/memory type cgroup
> (rw,nosuid,nodev,noexec,relatime,memory)
> > > cgroup on /sys/fs/cgroup/devices type cgroup
> (rw,nosuid,nodev,noexec,relatime,devices)
> > > cgroup on /sys/fs/cgroup/freezer type cgroup
> (rw,nosuid,nodev,noexec,relatime,freezer)
> > > cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup
> (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
> > > cgroup on /sys/fs/cgroup/blkio type cgroup
> (rw,nosuid,nodev,noexec,relatime,blkio)
> > > cgroup on /sys/fs/cgroup/perf_event type cgroup
> (rw,nosuid,nodev,noexec,relatime,perf_event)
> > > cgroup on /sys/fs/cgroup/hugetlb type cgroup
> (rw,nosuid,nodev,noexec,relatime,hugetlb)
> > > cgroup on /sys/fs/cgroup/debug type cgroup
> (rw,nosuid,nodev,noexec,relatime,debug)
> > > cgroup on /sys/fs/cgroup/palloc type cgroup
> (rw,nosuid,nodev,noexec,relatime,palloc)
> > > systemd-1 on /proc/sys/fs/binfmt_misc type autofs
> (rw,relatime,fd=23,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
> > > debugfs on /sys/kernel/debug type debugfs (rw,relatime)
> > > mqueue on /dev/mqueue type mqueue (rw,relatime)
> > > hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
> > > /dev/sda4 on /boot type ext4 (rw,relatime,data=ordered)
> > > rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw,relatime)
> > >
> > > root at dslab:/home# lxc-checkpoint -r -n d1 -D /home/checkpoint_dir/d2/
> > > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy -
> cgroup_rmdir: failed to delete /sys/fs/cgroup/palloc/lxc/d1-2
> > > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy -
> cgroup_rmdir: failed to delete /sys/fs/cgroup/debug/lxc/d1-2
> > > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy -
> cgroup_rmdir: failed to delete /sys/fs/cgroup/hugetlb/lxc/d1-2
> > > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy -
> cgroup_rmdir: failed to delete /sys/fs/cgroup/perf_event/lxc/d1-2
> > > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy -
> cgroup_rmdir: failed to delete /sys/fs/cgroup/blkio/lxc/d1-2
> > > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy -
> cgroup_rmdir: failed to delete /sys/fs/cgroup/net_cls,net_prio/lxc/d1-2
> > > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy -
> cgroup_rmdir: failed to delete /sys/fs/cgroup/freezer/lxc/d1-2
> > > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy -
> cgroup_rmdir: failed to delete /sys/fs/cgroup/devices/lxc/d1-2
> > > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy -
> cgroup_rmdir: failed to delete /sys/fs/cgroup/memory/lxc/d1-2
> > > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy -
> cgroup_rmdir: failed to delete /sys/fs/cgroup/cpu,cpuacct/lxc/d1-2
> > > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy -
> cgroup_rmdir: failed to delete /sys/fs/cgroup/cpuset/lxc/d1-2
> > > lxc-checkpoint: lxccontainer.c: do_lxcapi_restore: 3772 restore
> process died
> > > Restoring d1 failed.
>
> I've seen these from the restore code before and they're benign
> (basically, the restore failed and not all the tasks were wait()ed on
> before we try to delete the cgroup). That said, it's ugly and I'll try
> to post a fix soon.
>
> > > Warn  (cr-restore.c:1041): Set CLONE_PARENT | CLONE_NEWPID but it
> might cause restore problem,because not all kernels support such clone
> flags combinations!
> > > RTNETLINK answers: File exists
> > > RTNETLINK answers: File exists
> > > RTNETLINK answers: File exists
> > > RTNETLINK answers: File exists
> > >      1: Warn  (sk-unix.c:1229): sk unix: Can't unlink stale socket
> 0x36a8 peer 0 (name /run/systemd/notify dir -)
> > >      1: Warn  (sk-unix.c:1229): sk unix: Can't unlink stale socket
> 0x36aa peer 0 (name /run/systemd/private dir -)
> > >      1: Warn  (sk-unix.c:1229): sk unix: Can't unlink stale socket
> 0x36b4 peer 0 (name /run/systemd/shutdownd dir -)
> > >      1: Warn  (sk-unix.c:1229): sk unix: Can't unlink stale socket
> 0x36b6 peer 0 (name /run/systemd/journal/dev-log dir -)
> > >      1: Warn  (sk-unix.c:1229): sk unix: Can't unlink stale socket
> 0x36ba peer 0 (name /run/systemd/journal/stdout dir -)
> > >      1: Warn  (sk-unix.c:1229): sk unix: Can't unlink stale socket
> 0x36bc peer 0 (name /run/systemd/journal/socket dir -)
> > >      1: Warn  (sk-unix.c:1229): sk unix: Can't unlink stale socket
> 0x5bad peer 0x70ea (name /run/systemd/journal/stdout dir -)
> > >      1: Warn  (sk-unix.c:1229): sk unix: Can't unlink stale socket
> 0x6da7 peer 0x3788 (name /run/systemd/journal/stdout dir -)
> > >      1: Warn  (sk-unix.c:1229): sk unix: Can't unlink stale socket
> 0x6da6 peer 0x5f21 (name /run/systemd/journal/stdout dir -)
> > >      1: Warn  (sk-unix.c:1229): sk unix: Can't unlink stale socket
> 0x6da8 peer 0x784b (name /run/systemd/journal/stdout dir -)
> > >      1: Warn  (sk-unix.c:1229): sk unix: Can't unlink stale socket
> 0x6da9 peer 0x6b10 (name /run/systemd/journal/stdout dir -)
> > >      1: Warn  (sk-unix.c:1229): sk unix: Can't unlink stale socket
> 0x6daa peer 0x6159 (name /run/systemd/journal/stdout dir -)
> > >     68: Error (sk-packet.c:419): Can't bind packet socket: Invalid
> argument
> > > Error (cr-restore.c:1236): 3159 killed by signal 19
> > > Error (cr-restore.c:1236): 3159 killed by signal 19
> > > Error (cr-restore.c:1933): Restoring FAILED.
>
> Here the real problem. bind() is failing, probably because the unlink
> above failed. Unfortunately, we don't log the reason for the bind()
> failing, can you try with the attached patch?
>
> Pavel, perhaps we should apply this so it does report the error?
>
> Tycho
>
> > > --- Checkpoint/Restore ---
> > > checkpoint restore: enabled
> > > CONFIG_FHANDLE: enabled
> > > CONFIG_EVENTFD: enabled
> > > CONFIG_EPOLL: enabled
> > > CONFIG_UNIX_DIAG: enabled
> > > CONFIG_INET_DIAG: enabled
> > > CONFIG_PACKET_DIAG: enabled
> > > CONFIG_NETLINK_DIAG: enabled
> > > File capabilities: enabled
> > >
> > >
> > > How can I solve this problem? It's the same as the ubuntu.
> > >
> > >
> > >
> > > _______________________________________________
> > > CRIU mailing list
> > > CRIU at openvz.org
> > > https://lists.openvz.org/mailman/listinfo/criu
> > >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20151015/9e579e51/attachment-0001.html>


More information about the CRIU mailing list