<div dir="ltr">OK! I have applied this patch. The error message is this:<div><br><div><div>Warn (cr-restore.c:1047): Set CLONE_PARENT | CLONE_NEWPID but it might cause restore problem,because not all kernels support such clone flags combinations!</div><div>RTNETLINK answers: File exists</div><div>RTNETLINK answers: File exists</div><div>RTNETLINK answers: File exists</div><div>RTNETLINK answers: File exists</div><div>RTNETLINK answers: File exists</div><div> 1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c4ff peer 0 (name /run/systemd/notify dir -)</div><div>: No such file or directory</div><div> 1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c501 peer 0 (name /run/systemd/private dir -)</div><div>: No such file or directory</div><div> 1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c50b peer 0 (name /run/systemd/shutdownd dir -)</div><div>: No such file or directory</div><div> 1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c50d peer 0 (name /run/systemd/journal/dev-log dir -)</div><div>: No such file or directory</div><div> 1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c511 peer 0 (name /run/systemd/journal/stdout dir -)</div><div>: No such file or directory</div><div> 1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c513 peer 0 (name /run/systemd/journal/socket dir -)</div><div>: No such file or directory</div><div> 1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2b88d peer 0x28bc5 (name /run/systemd/journal/stdout dir -)</div><div>: No such file or directory</div><div> 1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c289 peer 0x2abaf (name /run/systemd/journal/stdout dir -)</div><div>: No such file or directory</div><div> 1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c28c peer 0x2b4cf (name /run/systemd/journal/stdout dir -)</div><div>: No such file or directory</div><div> 1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c28d peer 0x2b33d (name /run/systemd/journal/stdout dir -)</div><div>: No such file or directory</div><div> 1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c28a peer 0x2c8fa (name /run/systemd/journal/stdout dir -)</div><div>: No such file or directory</div><div> 1: Error (sk-unix.c:1245): sk unix: Can't unlink stale socket 0x2c28b peer 0x2b902 (name /run/systemd/journal/stdout dir -)</div><div>: No such file or directory</div><div> 68: Error (sk-packet.c:419): Can't bind packet socket: Invalid argument</div><div>Error (cr-restore.c:1235): 28680 killed by signal 19</div><div>Error (cr-restore.c:1235): 28680 killed by signal 19</div><div>Error (cr-restore.c:1959): Restoring FAILED.</div></div><div><br></div><div>BTW my network config is:</div><div><div>br0 Link encap:Ethernet HWaddr 40:f2:e9:d2:81:38 </div><div> inet addr:xxx Bcast:xxx Mask:255.255.255.0</div><div> inet6 addr: xxx Scope:Global</div><div> inet6 addr: xxx Scope:Link</div><div> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1</div><div> RX packets:1472399 errors:0 dropped:0 overruns:0 frame:0</div><div> TX packets:30432 errors:0 dropped:0 overruns:0 carrier:0</div><div> collisions:0 txqueuelen:0 </div><div> RX bytes:115725262 (110.3 MiB) TX bytes:4463645 (4.2 MiB)</div><div><br></div><div>eth0 Link encap:Ethernet HWaddr 40:f2:e9:d2:81:38 </div><div> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1</div><div> RX packets:1642533 errors:0 dropped:1196 overruns:0 frame:0</div><div> TX packets:100308 errors:0 dropped:0 overruns:0 carrier:0</div><div> collisions:0 txqueuelen:1000 </div><div> RX bytes:352311619 (335.9 MiB) TX bytes:9232447 (8.8 MiB)</div><div> Memory:90580000-9059ffff </div><div><br></div><div>lo Link encap:Local Loopback </div><div> inet addr:127.0.0.1 Mask:255.0.0.0</div><div> inet6 addr: ::1/128 Scope:Host</div><div> UP LOOPBACK RUNNING MTU:65536 Metric:1</div><div> RX packets:0 errors:0 dropped:0 overruns:0 frame:0</div><div> TX packets:0 errors:0 dropped:0 overruns:0 carrier:0</div><div> collisions:0 txqueuelen:0 </div><div> RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)</div></div><div><br></div><div><br></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">2015-10-15 2:54 GMT+08:00 Tycho Andersen <span dir="ltr"><<a href="mailto:tycho.andersen@canonical.com" target="_blank">tycho.andersen@canonical.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Jason, Pavel,<br>
<div><div class="h5"><br>
On Wed, Oct 14, 2015 at 03:03:37PM +0300, Pavel Emelyanov wrote:<br>
> Adding Tycho (an LXC guy) to the discussion.<br>
><br>
> On 10/14/2015 06:56 AM, Jason Lee wrote:<br>
> > Hi all!<br>
> > Recently I use lxc-checkpoint to c/r linux container.When dumping criu,It's no<br>
> > problem.but I use lxc-checkpoint -r to restore one lxc. It's failed!<br>
> > BTW My host os is debian 8 .Here is my enviorment:<br>
> ><br>
> > lxc.rootfs = /usr/local/var/lib/lxc/d1/rootfs<br>
> > lxc.include = /usr/local/share/lxc/config/debian.common.conf<br>
> > lxc.utsname = d1<br>
> > lxc.arch = amd64<br>
> > lxc.tty = 0<br>
> > lxc.pts = 1<br>
> > lxc.console = none<br>
> ><br>
> > #lxc.cap.drop = sys_module mac_admin mac_override sys_time<br>
> > lxc.cgroup.devices.deny = c 5:1 rwm<br>
> > lxc.aa_allow_incomplete = 1<br>
> > lxc.network.type = veth<br>
> > lxc.network.flags = up<br>
> > # that's the interface defined above in host's interfaces file<br>
> > lxc.network.link = br0<br>
> > # name of network device inside the container,<br>
> > # defaults to eth0, you could choose a name freely<br>
> > # <a href="http://lxc.network.name" rel="noreferrer" target="_blank">lxc.network.name</a> <<a href="http://lxc.network.name" rel="noreferrer" target="_blank">http://lxc.network.name</a>> = lxcnet0<br>
> > lxc.network.hwaddr = 00:16:3e:d2:29:be<br>
> ><br>
> > mount point:<br>
> > root@dslab:/home# mount<br>
> > sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)<br>
> > proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)<br>
> > udev on /dev type devtmpfs (rw,relatime,size=10240k,nr_inodes=1002688,mode=755)<br>
> > devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)<br>
> > tmpfs on /run type tmpfs (rw,nosuid,relatime,size=1607656k,mode=755)<br>
> > /dev/sda6 on / type ext4 (rw,relatime,errors=remount-ro,data=ordered)<br>
> > securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)<br>
> > tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)<br>
> > tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)<br>
> > tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)<br>
> > cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)<br>
> > pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)<br>
> > cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset,clone_children)<br>
> > cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)<br>
> > cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)<br>
> > cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)<br>
> > cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)<br>
> > cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)<br>
> > cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)<br>
> > cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)<br>
> > cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)<br>
> > cgroup on /sys/fs/cgroup/debug type cgroup (rw,nosuid,nodev,noexec,relatime,debug)<br>
> > cgroup on /sys/fs/cgroup/palloc type cgroup (rw,nosuid,nodev,noexec,relatime,palloc)<br>
> > systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=23,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)<br>
> > debugfs on /sys/kernel/debug type debugfs (rw,relatime)<br>
> > mqueue on /dev/mqueue type mqueue (rw,relatime)<br>
> > hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)<br>
> > /dev/sda4 on /boot type ext4 (rw,relatime,data=ordered)<br>
> > rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw,relatime)<br>
> ><br>
> > root@dslab:/home# lxc-checkpoint -r -n d1 -D /home/checkpoint_dir/d2/<br>
> > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy - cgroup_rmdir: failed to delete /sys/fs/cgroup/palloc/lxc/d1-2<br>
> > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy - cgroup_rmdir: failed to delete /sys/fs/cgroup/debug/lxc/d1-2<br>
> > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy - cgroup_rmdir: failed to delete /sys/fs/cgroup/hugetlb/lxc/d1-2<br>
> > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy - cgroup_rmdir: failed to delete /sys/fs/cgroup/perf_event/lxc/d1-2<br>
> > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy - cgroup_rmdir: failed to delete /sys/fs/cgroup/blkio/lxc/d1-2<br>
> > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy - cgroup_rmdir: failed to delete /sys/fs/cgroup/net_cls,net_prio/lxc/d1-2<br>
> > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy - cgroup_rmdir: failed to delete /sys/fs/cgroup/freezer/lxc/d1-2<br>
> > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy - cgroup_rmdir: failed to delete /sys/fs/cgroup/devices/lxc/d1-2<br>
> > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy - cgroup_rmdir: failed to delete /sys/fs/cgroup/memory/lxc/d1-2<br>
> > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy - cgroup_rmdir: failed to delete /sys/fs/cgroup/cpu,cpuacct/lxc/d1-2<br>
> > lxc-checkpoint: cgfs.c: cgroup_rmdir: 207 Device or resource busy - cgroup_rmdir: failed to delete /sys/fs/cgroup/cpuset/lxc/d1-2<br>
> > lxc-checkpoint: lxccontainer.c: do_lxcapi_restore: 3772 restore process died<br>
> > Restoring d1 failed.<br>
<br>
</div></div>I've seen these from the restore code before and they're benign<br>
(basically, the restore failed and not all the tasks were wait()ed on<br>
before we try to delete the cgroup). That said, it's ugly and I'll try<br>
to post a fix soon.<br>
<span class=""><br>
> > Warn (cr-restore.c:1041): Set CLONE_PARENT | CLONE_NEWPID but it might cause restore problem,because not all kernels support such clone flags combinations!<br>
> > RTNETLINK answers: File exists<br>
> > RTNETLINK answers: File exists<br>
> > RTNETLINK answers: File exists<br>
> > RTNETLINK answers: File exists<br>
> > 1: Warn (sk-unix.c:1229): sk unix: Can't unlink stale socket 0x36a8 peer 0 (name /run/systemd/notify dir -)<br>
> > 1: Warn (sk-unix.c:1229): sk unix: Can't unlink stale socket 0x36aa peer 0 (name /run/systemd/private dir -)<br>
> > 1: Warn (sk-unix.c:1229): sk unix: Can't unlink stale socket 0x36b4 peer 0 (name /run/systemd/shutdownd dir -)<br>
> > 1: Warn (sk-unix.c:1229): sk unix: Can't unlink stale socket 0x36b6 peer 0 (name /run/systemd/journal/dev-log dir -)<br>
> > 1: Warn (sk-unix.c:1229): sk unix: Can't unlink stale socket 0x36ba peer 0 (name /run/systemd/journal/stdout dir -)<br>
> > 1: Warn (sk-unix.c:1229): sk unix: Can't unlink stale socket 0x36bc peer 0 (name /run/systemd/journal/socket dir -)<br>
> > 1: Warn (sk-unix.c:1229): sk unix: Can't unlink stale socket 0x5bad peer 0x70ea (name /run/systemd/journal/stdout dir -)<br>
> > 1: Warn (sk-unix.c:1229): sk unix: Can't unlink stale socket 0x6da7 peer 0x3788 (name /run/systemd/journal/stdout dir -)<br>
> > 1: Warn (sk-unix.c:1229): sk unix: Can't unlink stale socket 0x6da6 peer 0x5f21 (name /run/systemd/journal/stdout dir -)<br>
> > 1: Warn (sk-unix.c:1229): sk unix: Can't unlink stale socket 0x6da8 peer 0x784b (name /run/systemd/journal/stdout dir -)<br>
> > 1: Warn (sk-unix.c:1229): sk unix: Can't unlink stale socket 0x6da9 peer 0x6b10 (name /run/systemd/journal/stdout dir -)<br>
> > 1: Warn (sk-unix.c:1229): sk unix: Can't unlink stale socket 0x6daa peer 0x6159 (name /run/systemd/journal/stdout dir -)<br>
> > 68: Error (sk-packet.c:419): Can't bind packet socket: Invalid argument<br>
> > Error (cr-restore.c:1236): 3159 killed by signal 19<br>
> > Error (cr-restore.c:1236): 3159 killed by signal 19<br>
> > Error (cr-restore.c:1933): Restoring FAILED.<br>
<br>
</span>Here the real problem. bind() is failing, probably because the unlink<br>
above failed. Unfortunately, we don't log the reason for the bind()<br>
failing, can you try with the attached patch?<br>
<br>
Pavel, perhaps we should apply this so it does report the error?<br>
<span class="HOEnZb"><font color="#888888"><br>
Tycho<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
> > --- Checkpoint/Restore ---<br>
> > checkpoint restore: enabled<br>
> > CONFIG_FHANDLE: enabled<br>
> > CONFIG_EVENTFD: enabled<br>
> > CONFIG_EPOLL: enabled<br>
> > CONFIG_UNIX_DIAG: enabled<br>
> > CONFIG_INET_DIAG: enabled<br>
> > CONFIG_PACKET_DIAG: enabled<br>
> > CONFIG_NETLINK_DIAG: enabled<br>
> > File capabilities: enabled<br>
> ><br>
> ><br>
> > How can I solve this problem? It's the same as the ubuntu.<br>
> ><br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > CRIU mailing list<br>
> > <a href="mailto:CRIU@openvz.org">CRIU@openvz.org</a><br>
> > <a href="https://lists.openvz.org/mailman/listinfo/criu" rel="noreferrer" target="_blank">https://lists.openvz.org/mailman/listinfo/criu</a><br>
> ><br>
><br>
</div></div></blockquote></div><br></div>