[CRIU] AUFS Support in CRIU
Saied Kazemi
saied at google.com
Mon Aug 11 16:28:32 PDT 2014
Hi Pavel,
Attached please find a patch file for supporting AUFS in CRIU. The main
reason that we need the patch is to compensate for missing and/or erroneous
information we get from the kernel in /proc/mountinfo and
/proc/<PID>/map_files (more details below). Once these are corrected, we
can revert the patch.
I tried to minimize the changes to CRIU's code, so the majority of the code
is in a new file, sysfs_parse.c. Also, please consider this as a quick
workaround to get CRIU to dump and restore Docker containers that use AUFS
(default graph driver). Feel free to make any changes.
Here are mountinfo and map_files issues that I ran into:
1) /proc/mountinfo
The issue is that for AUFS the root entry looks something like:
90 61 0:33 / / rw,relatime - aufs none rw,si=4476a910a24617e6
Compared to VFS (which CRIU supports with no issues), the root field is
missing the pathname (it's only /) and its device, source, and options do
not match the underlying device/fs values causing external bind mounts like
/etc/hosts not to be mounted upon restore (mounts_equal() fails). To
compensate for these, I specify the root pathname with --aufs-root option
and a "reference" file with the --aufs-ref option to "fix up" the fields.
There must be a better way to do this, but as I hope this will be
temporary code I didn't spend too much time investigating.
2) /proc/<PID>/map_files
The issue is that symbolic links point to the internals of AUFS, namely the
absolute pathnames in AUFS branches (versus pathnames from the root of the
mount namespace). Below is an example:
lr-------- 1 root root 64 Jul 23 17:15 400000-489000 ->
/var/lib/docker/aufs/diff/<ID>/bin/busybox
where /var/lib/docker/aufs/diff/<ID> is an AUFS branch. The link should
point to /bin/busybox instead. To compensate for this, I specify the
--aufs option to parse branch pathnames through /sys/fs/aufs and replace
them with pathnames from the root during lookups (i.e., stat(), fstat()).
The good news is that when we do this during dump, we won't have to specify
any AUFS options during restore.
The patch is against commit 7a203afe0a4 from August 7th. When I rebased to
the head today, I got fatal error messages from the recently added cgroups
code so I reverted.
Below is the log of my test. Please let me know what you think.
Thanks,
--Saied
# docker run -d busybox:latest /bin/sh -c 'i=0; while true; do echo $i >>
/foo; i=`expr $i + 1`; sleep 3; done'
<ID>
# ps -efl | grep /bin/sh
4 S root 4423 27064 0 80 0 - 791 wait 16:00 ? 00:00:00
/bin/sh -c i=0; while true; do echo $i >> /foo; i=`expr $i + 1`; sleep 3;
done
0 S root 4475 27195 0 80 0 - 2936 pipe_w 16:01 pts/4 00:00:00
grep --color=auto /bin/sh
# criu dump -D /tmp/img.aufs -o dump.log -v4 --evasive-devices
--ext-mount-map /etc/resolv.conf:/etc/resolv.conf --ext-mount-map
/etc/hostname:/etc/hostname --ext-mount-map /etc/hosts:/etc/hosts --aufs
--aufs-root /var/lib/docker/aufs/mnt/<ID> --aufs-ref /etc/hosts -t 4423
# grep "finished successfully" /tmp/img.aufs/dump.log
(00.029719) Dumping finished successfully
# ps -efl | grep /bin/sh
0 S root 4532 27195 0 80 0 - 2936 pipe_w 16:01 pts/4 00:00:00
grep --color=auto /bin/sh
# mount -t aufs -o br=/var/lib/docker/aufs/diff/<ID> \
/var/lib/docker/aufs/diff/<ID>-init \
/var/lib/docker/aufs/diff/<BRID1> \
/var/lib/docker/aufs/diff/<BRID2> \
/var/lib/docker/aufs/diff/<BRID3> \
/var/lib/docker/aufs/diff/<BDID4> \
none /var/lib/docker/aufs/mnt/<ID>
# criu restore -D /tmp/img.aufs -o restore.log -v4 -d --root
/var/lib/docker/aufs/mnt/<ID> --pidfile /tmp/img.aufs/restore.pid
--ext-mount-map /etc/resolv.conf:/etc/resolv.conf --ext-mount-map
/etc/hostname:/var/lib/docker/containers/<ID>/hostname --ext-mount-map
/etc/hosts:/var/lib/docker/containers/<ID>/hosts
# grep "finished successfully" /tmp/img.aufs/restore.log
(00.451924) Restore finished successfully. Resuming tasks.
# ps -efl | grep /bin/sh
5 S root 4569 1 0 80 0 - 791 wait 16:01 ? 00:00:00
/bin/sh -c i=0; while true; do echo $i >> /foo; i=`expr $i + 1`; sleep 3;
done
0 S root 4594 27195 0 80 0 - 2936 pipe_w 16:01 pts/4 00:00:00
grep --color=auto /bin/sh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20140811/8b31b055/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aufs.patch
Type: application/octet-stream
Size: 21161 bytes
Desc: not available
URL: <http://lists.openvz.org/pipermail/criu/attachments/20140811/8b31b055/attachment-0001.obj>
More information about the CRIU
mailing list