<div dir="ltr">Hi Pavel,<div><br></div><div>Thanks for the review.</div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Aug 5, 2015 at 4:00 AM, Pavel Emelyanov <span dir="ltr"><<a href="mailto:xemul@parallels.com" target="_blank">xemul@parallels.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Gabriel,<br>
<br>
Please, find my comments inline.<br>
<br>
> diff --git a/crtools.c b/crtools.c<br>
> index 6af6080..521d0ff 100644<br>
> --- a/crtools.c<br>
> +++ b/crtools.c<br>
> @@ -235,6 +235,7 @@ int main(int argc, char *argv[], char *envp[])<br>
> { "enable-fs", required_argument, 0, 1065 },<br>
> { "enable-external-sharing", no_argument, 0, 1066 },<br>
> { "enable-external-masters", no_argument, 0, 1067 },<br>
> + { "overlayfs", no_argument, 0, 1068 },<br>
<br>
With AUFS workaround we ended up with auto-detection scheme -- when<br>
meeting an AUFS mount the opts.aufs is set to true. Can we do the<br>
same with overlayfs?<br></blockquote><div><br></div><div>Technically we could always enable overlayfs by default, since we only do something if there is an overlayfs mounted entry in the mountinfo table. Should I enable it always?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
> { },<br>
> };<br>
><br>
> @@ -465,6 +466,9 @@ int main(int argc, char *argv[], char *envp[])<br>
> case 1067:<br>
> opts.enable_external_masters = true;<br>
> break;<br>
> + case 1068:<br>
> + opts.overlayfs = true;<br>
> + break;<br>
> case 'M':<br>
> {<br>
> char *aux;<br>
> diff --git a/files.c b/files.c<br>
> index c82920d..2c6cca0 100644<br>
> --- a/files.c<br>
> +++ b/files.c<br>
> @@ -30,6 +30,7 @@<br>
> #include "eventfd.h"<br>
> #include "eventpoll.h"<br>
> #include "fsnotify.h"<br>
> +#include "mount.h"<br>
> #include "signalfd.h"<br>
> #include "namespaces.h"<br>
> #include "tun.h"<br>
> @@ -157,6 +158,45 @@ void show_saved_files(void)<br>
> }<br>
><br>
> /*<br>
> + * Workaround for the OverlayFS bug present before Kernel 4.2<br>
> + *<br>
> + * When a process has a file open in an OverlayFS directory,<br>
> + * the information in /proc/<pid>/fd/<fd> and /proc/<pid>/fdinfo/<fd><br>
> + * is wrong, so, if --overlayfs is specified, we grab that information<br>
> + * from the mountinfo table instead.<br>
> + *<br>
> + * This is done every time fill_fdlink is called. See lookup_overlayfs<br>
> + * for more details.<br>
> + */<br>
> +static int fixup_overlayfs(struct fd_parms *p, struct fd_link *link)<br>
> +{<br>
> + struct mount_info *m;<br>
> +<br>
> + if (!link)<br>
> + return 0;<br>
> +<br>
> + m = lookup_overlayfs(link->name, p->stat.st_dev, p->stat.st_ino, p->mnt_id);<br>
> + if (!m)<br>
> + return 0;<br>
<br>
NULL here means both -- error occurred and nothing-to-do. Can you<br>
check for errors using the ERR_PTR notation?<br>
<br>
> +<br>
> + p->mnt_id = m->mnt_id;<br>
> +<br>
> + /* Stores the correct file path in p->link->name */<br>
<br>
Can you extend this comment specifying what the incorrectness is.<br></blockquote><div>Sure.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
> + if (strcmp("./", m->mountpoint) != 0) {<br>
> + char buf[PATH_MAX];<br>
> + int n;<br>
> +<br>
> + strncpy(buf, link->name, PATH_MAX);<br>
> + n = snprintf(link->name, PATH_MAX, "%s/%s", m->mountpoint, buf + 2);<br>
> + if (n >= PATH_MAX) {<br>
> + pr_err("Not enough space to replace %s\n", buf);<br>
> + return -1;<br>
> + }<br>
> + }<br>
> + return 0;<br>
> +}<br>
> +<br>
> +/*<br>
> * The gen_id thing is used to optimize the comparison of shared files.<br>
> * If two files have different gen_ids, then they are different for sure.<br>
> * If it matches, we don't know it and have to call sys_kcmp().<br>
> @@ -206,6 +246,10 @@ int fill_fdlink(int lfd, const struct fd_parms *p, struct fd_link *link)<br>
> }<br>
><br>
> link->len = len + 1;<br>
> +<br>
> + if (opts.overlayfs)<br>
> + if (fixup_overlayfs((struct fd_parms *)p, link) < 0)<br>
> + return -1;<br>
> return 0;<br>
> }<br>
><br>
> diff --git a/include/cr_options.h b/include/cr_options.h<br>
> index 19c2f77..011349c 100644<br>
> --- a/include/cr_options.h<br>
> +++ b/include/cr_options.h<br>
> @@ -79,6 +79,7 @@ struct cr_options {<br>
> bool enable_external_sharing;<br>
> bool enable_external_masters;<br>
> bool aufs; /* auto-deteced, not via cli */<br>
> + bool overlayfs;<br>
> };<br>
><br>
> extern struct cr_options opts;<br>
> diff --git a/include/mount.h b/include/mount.h<br>
> index 0d5fc7f..01da3f5 100644<br>
> --- a/include/mount.h<br>
> +++ b/include/mount.h<br>
> @@ -22,6 +22,8 @@ extern int prepare_mnt_ns(void);<br>
> extern int pivot_root(const char *new_root, const char *put_old);<br>
><br>
> struct mount_info;<br>
> +struct mount_info *lookup_overlayfs(char *rpath, unsigned int s_dev,<br>
> + unsigned int st_ino, unsigned int mnt_id);<br>
> extern struct mount_info *lookup_mnt_id(unsigned int id);<br>
> extern struct mount_info *lookup_mnt_sdev(unsigned int s_dev);<br>
><br>
> diff --git a/mount.c b/mount.c<br>
> index acd74f0..6072050 100644<br>
> --- a/mount.c<br>
> +++ b/mount.c<br>
> @@ -127,6 +127,95 @@ static inline int fsroot_mounted(struct mount_info *mi)<br>
> return is_root(mi->root);<br>
> }<br>
><br>
> +static struct mount_info *__lookup_overlayfs(struct mount_info *list, char *rpath,<br>
> + unsigned int st_dev, unsigned int st_ino,<br>
> + unsigned int mnt_id)<br>
> +{<br>
> + /*<br>
> + * Goes through all entries in the mountinfo table<br>
> + * looking for a mount point that contains the file specified<br>
> + * in rpath. Uses the device number st_dev and the inode number st_ino<br>
> + * to make sure the file is correct.<br>
> + */<br>
> + struct mount_info *mi_ret = NULL;<br>
> + struct mount_info *m;<br>
> +<br>
> + for (m = list; m != NULL; m = m->next) {<br>
> + /* Case 1: If mnt_id is correct, don't do anything */<br>
> + if (st_dev == m->s_dev && mnt_id == m->mnt_id)<br>
> + return NULL;<br>
<br>
Shouldn't this if () be under the next one, so that everything that happens<br>
here applies to overlayfs only?</blockquote><div> </div><div>This if () is basically a check to see if the bug exists. We look for all instances of the device st_dev and see if any of them match our mnt_id. If that's not the case, the bug exists and we want to proceed to try to fix it. </div><div><br></div><div>I could have left this as a separate for loop to make this more clear, but this way we only use one for loop for both. I believe it's better if we separate it into two for loops since we prevent some unnecessary stating. I can also move this part to lookup_overlayfs, before dealing with namespaces. What do you think?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
> +<br>
> + /*<br>
> + * Case 2: If overlayFS we try to find the correct mnt_id.<br>
> + *<br>
> + * Note that even if we do find something, we want to keep looking for<br>
> + * Case 1, which takes precedence.<br>
> + */<br>
> + if (m->fstype->code == FSTYPE__OVERLAYFS) {<br>
> + char _abs_path[PATH_MAX];<br>
> + struct stat f_stat;<br>
> + int ret_stat;<br>
> +<br>
> + /* Concatenates m->mountpoint with rpath and attempts to stat the resulting path */<br>
> + if (strcmp("./", m->mountpoint) == 0)<br>
> + ret_stat = stat(rpath + 1, &f_stat);<br>
> + else {<br>
> + int n = snprintf(_abs_path, PATH_MAX, "%s/%s", m->mountpoint + 1, rpath);<br>
> +<br>
> + if (n >= PATH_MAX) {<br>
> + pr_err("Not enough space to concatenate %s and %s\n", m->mountpoint, rpath);<br>
> + return NULL;<br>
> + }<br>
> + ret_stat = stat(_abs_path, &f_stat);<br>
> + }<br>
> +<br>
> + if (ret_stat == 0 && st_dev == f_stat.st_dev && st_ino == f_stat.st_ino)<br>
> + mi_ret = m;<br>
> + }<br>
> + }<br>
> +<br>
> + return mi_ret;<br>
> +}<br>
> +<br>
> +/*<br>
> + * Looks up the mnt_id and path of a file in an overlayFS directory.<br>
> + *<br>
> + * This is useful in order to fix the OverlayFS bug present in the<br>
> + * Linux Kernel before version 4.2. See fixup_overlayfs for details.<br>
> + *<br>
> + * We first enter the mount namespace of the process and check to see<br>
> + * if there are any overlayFS mounted directories in the mountinfo table.<br>
> + * If so, we concatenate the mountpoint with the name of the file, and<br>
> + * stat the resulting path to check if we found the correct device id<br>
> + * and node number. If that is the case, we update the mount id and<br>
> + * link variables with the correct values.<br>
> + *<br>
> + * This update is skipped if there is an entry in the<br>
> + * mountinfo table with the same known device number and<br>
> + * mount id, signifying that we already have the correct id.<br>
> + *<br>
> + * At the end of this procedure, we exit the mount namespace<br>
> + * of the process, back to the original one.<br>
> + */<br>
> +struct mount_info *lookup_overlayfs(char *rpath, unsigned int st_dev,<br>
> + unsigned int st_ino, unsigned int mnt_id)<br>
> +{<br>
> + int ns_old = -1;<br>
> + struct mount_info *mi;<br>
> +<br>
> + if (switch_ns(root_item->pid.real, &mnt_ns_desc, &ns_old) < 0) {<br>
<br>
The switch_ns call is quite slow. If you only need to stat the file by path<br>
in proper mount namespace, you can get the namespace's root fd with the<br>
mntns_get_root_fd() call and then do fstatat(mntns_root, path, ...);</blockquote><div><br></div><div>I'll look into this!</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> <br></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
> + pr_err("Failed to switch into mount namespace of PID %d\n", root_item->pid.real);<br>
> + return NULL;<br>
> + }<br>
> + mi = __lookup_overlayfs(mntinfo, rpath, st_dev, st_ino, mnt_id);<br>
> +<br>
> + if (restore_ns(ns_old, &mnt_ns_desc)) {<br>
> + pr_err("Failed to restore to old mount namespace %d\n", ns_old);<br>
> + return NULL;<br>
> + }<br>
> + return mi;<br>
> +}<br>
> +<br>
> static struct mount_info *__lookup_mnt_id(struct mount_info *list, int id)<br>
> {<br>
> struct mount_info *m;<br>
> @@ -1365,6 +1454,9 @@ static struct fstype fstypes[32] = {<br>
> .code = FSTYPE__FUSE,<br>
> .dump = always_fail,<br>
> .restore = always_fail,<br>
> + }, {<br>
> + .name = "overlay",<br>
> + .code = FSTYPE__OVERLAYFS,<br>
> },<br>
> };<br>
><br>
> diff --git a/protobuf/mnt.proto b/protobuf/mnt.proto<br>
> index 6f8e7d1..6e58e1d 100644<br>
> --- a/protobuf/mnt.proto<br>
> +++ b/protobuf/mnt.proto<br>
> @@ -18,6 +18,7 @@ enum fstype {<br>
> MQUEUE = 14;<br>
> FUSE = 15;<br>
> AUTO = 16;<br>
> + OVERLAYFS = 17;<br>
> };<br>
><br>
> message mnt_entry {<br>
<span class="HOEnZb"><font color="#888888"><br>
-- Pavel<br>
</font></span></blockquote></div><br></div></div>