[CRIU] OverlayFS Patch
Gabriel Guimaraes
gabriellimaguimaraes at gmail.com
Wed Aug 5 16:21:50 PDT 2015
So I just tested it and it turns out that, because of the Kernel bug,
statfs returns ext2/ext3 instead of OVERLAYFS_MAGIC when used on a symlink
to a file in an OVFs directory, so we can't rely on that.
The following patch enables opts.overlayfs = true in main and makes sure
that we only call __mntns_get_root_fd and fstatat if there is an OverlayFS
mounted directory in the mountinfo table. Pavel, if you want we can remove
opts.overlayfs altogether.
Let me know what you think!
2015-08-05 15:12 GMT-07:00 Gabriel Guimaraes <gabriellimaguimaraes at gmail.com
>:
> Anyway, as is we can simply enable opts.overlayfs = true always, and make
> sure that __mntns_get_root_fd and fstatat get called only if there is an
> OverlayFS entry in the mountinfo table. What do you think of that?
>
> 2015-08-05 15:06 GMT-07:00 Gabriel Guimaraes <
> gabriellimaguimaraes at gmail.com>:
>
>> Hmm, I'm using GDB to look directly at the value of p->fs_type within
>> fill_fdlink, when p refers to a file within a Docker container mounted with
>> Overlayfs and I get p->fs_type = 0xef53
>>
>> 2015-08-05 14:36 GMT-07:00 Filipe Brandenburger <filbranden at google.com>:
>>
>>> More specifically, I'm looking at this code:
>>>
>>> /**
>>> * ovl_statfs
>>> * @sb: The overlayfs super block
>>> * @buf: The struct kstatfs to fill in with stats
>>> *
>>> * Get the filesystem statistics. As writes always target the upper
>>> layer
>>> * filesystem pass the statfs to the same filesystem.
>>> */
>>> static int ovl_statfs(struct dentry *dentry, struct kstatfs *buf)
>>> {
>>> struct ovl_fs *ofs = dentry->d_sb->s_fs_info;
>>> struct dentry *root_dentry = dentry->d_sb->s_root;
>>> struct path path;
>>> int err;
>>>
>>> ovl_path_upper(root_dentry, &path);
>>>
>>> err = vfs_statfs(&path, buf);
>>> if (!err) {
>>> buf->f_namelen = max(buf->f_namelen, ofs->lower_namelen);
>>> buf->f_type = OVERLAYFS_SUPER_MAGIC;
>>> }
>>>
>>> return err;
>>> }
>>>
>>>
>>> So, indeed, it's forwarding the call to the upper layer (since it's
>>> where writes happen) but before returning it's overwriting f_type with
>>> OVERLAYFS_SUPER_MAGIC which is the constant with the value I posted
>>> above...
>>>
>>> Cheers,
>>> Filipe
>>>
>>>
>>> On Wed, Aug 5, 2015 at 2:34 PM, Filipe Brandenburger
>>> <filbranden at google.com> wrote:
>>> > Can you try a directory instead of a file?
>>> >
>>> > Either a subdirectory or the root of the filesystem (where it's
>>> mounted.)
>>> >
>>> >
>>> > On Wed, Aug 5, 2015 at 2:31 PM, Gabriel Guimaraes
>>> > <gabriellimaguimaraes at gmail.com> wrote:
>>> >> Hi Filipe,
>>> >>
>>> >> I just checked the value of p->fs_type when referring to /tmp/file
>>> within a
>>> >> Docker container, using overlayFS. It reports
>>> >>
>>> >> p->fs_type = 0xef53 which I believe is the magic number of ext2/3/4
>>> >>
>>> >> 2015-08-05 14:25 GMT-07:00 Filipe Brandenburger <
>>> filbranden at google.com>:
>>> >>>
>>> >>> Hi Gabriel,
>>> >>>
>>> >>> Looking at the code, I believe it f_type in statfs() output will
>>> >>> return this value for overlayfs:
>>> >>>
>>> >>> fs/overlayfs/super.c:#define OVERLAYFS_SUPER_MAGIC 0x794c7630
>>> >>>
>>> >>> You can easily double check that with a test program running it
>>> >>> against a mounted overlayfs.
>>> >>>
>>> >>> Cheers,
>>> >>> Filipe
>>> >>>
>>> >>>
>>> >>> On Wed, Aug 5, 2015 at 2:19 PM, Gabriel Guimaraes
>>> >>> <gabriellimaguimaraes at gmail.com> wrote:
>>> >>> >
>>> >>> >
>>> >>> > 2015-08-05 13:36 GMT-07:00 Pavel Emelyanov <xemul at parallels.com>:
>>> >>> >>
>>> >>> >> On 08/05/2015 08:51 PM, Gabriel Guimaraes wrote:
>>> >>> >> > Hi Pavel,
>>> >>> >> >
>>> >>> >> > Thanks for the review.
>>> >>> >> >
>>> >>> >> > On Wed, Aug 5, 2015 at 4:00 AM, Pavel Emelyanov <
>>> xemul at parallels.com
>>> >>> >> > <mailto:xemul at parallels.com>> wrote:
>>> >>> >> >
>>> >>> >> > Gabriel,
>>> >>> >> >
>>> >>> >> > Please, find my comments inline.
>>> >>> >> >
>>> >>> >> > > diff --git a/crtools.c b/crtools.c
>>> >>> >> > > index 6af6080..521d0ff 100644
>>> >>> >> > > --- a/crtools.c
>>> >>> >> > > +++ b/crtools.c
>>> >>> >> > > @@ -235,6 +235,7 @@ int main(int argc, char *argv[], char
>>> >>> >> > *envp[])
>>> >>> >> > > { "enable-fs",
>>> >>> >> > required_argument,
>>> >>> >> > 0, 1065 },
>>> >>> >> > > { "enable-external-sharing", no_argument,
>>> >>> >> > 0, 1066 },
>>> >>> >> > > { "enable-external-masters", no_argument,
>>> >>> >> > 0, 1067 },
>>> >>> >> > > + { "overlayfs", no_argument,
>>> >>> >> > 0, 1068 },
>>> >>> >> >
>>> >>> >> > With AUFS workaround we ended up with auto-detection scheme
>>> --
>>> >>> >> > when
>>> >>> >> > meeting an AUFS mount the opts.aufs is set to true. Can we
>>> do the
>>> >>> >> > same with overlayfs?
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > Technically we could always enable overlayfs by default, since
>>> we
>>> >>> >> > only
>>> >>> >> > do something if there
>>> >>> >> > is an overlayfs mounted entry in the mountinfo table. Should I
>>> enable
>>> >>> >> > it
>>> >>> >> > always?
>>> >>> >>
>>> >>> >> In the fill_fdlink() there's fd_parms structure with fs_type field
>>> >>> >> filled
>>> >>> >> from
>>> >>> >> statfs() syscall. If we check for this value being overlayfs
>>> magic, we
>>> >>> >> can
>>> >>> >> silently
>>> >>> >> imply the workaround is on and go calling fixup_overlayfs().
>>> >>> >>
>>> >>> > Per man statfs, I don't think Overlayfs has a magic number.
>>> OverlayFS
>>> >>> > just
>>> >>> > overlays two filesystems on top of each other. From statfs' point
>>> of
>>> >>> > view, a
>>> >>> > file's filesystem type is the original one, not OverlayFS.
>>> >>> >
>>> >>> > More information here:
>>> >>> > https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt
>>> >>> >>
>>> >>> >> > > +static struct mount_info *__lookup_overlayfs(struct
>>> mount_info
>>> >>> >> > *list, char *rpath,
>>> >>> >> > > + unsigned int
>>> >>> >> > st_dev,
>>> >>> >> > unsigned int st_ino,
>>> >>> >> > > + unsigned int
>>> >>> >> > mnt_id)
>>> >>> >> > > +{
>>> >>> >> > > + /*
>>> >>> >> > > + * Goes through all entries in the mountinfo table
>>> >>> >> > > + * looking for a mount point that contains the file
>>> >>> >> > specified
>>> >>> >> > > + * in rpath. Uses the device number st_dev and the
>>> inode
>>> >>> >> > number st_ino
>>> >>> >> > > + * to make sure the file is correct.
>>> >>> >> > > + */
>>> >>> >> > > + struct mount_info *mi_ret = NULL;
>>> >>> >> > > + struct mount_info *m;
>>> >>> >> > > +
>>> >>> >> > > + for (m = list; m != NULL; m = m->next) {
>>> >>> >> > > + /* Case 1: If mnt_id is correct, don't do
>>> >>> >> > anything
>>> >>> >> > */
>>> >>> >> > > + if (st_dev == m->s_dev && mnt_id ==
>>> m->mnt_id)
>>> >>> >> > > + return NULL;
>>> >>> >> >
>>> >>> >> > Shouldn't this if () be under the next one, so that
>>> everything
>>> >>> >> > that
>>> >>> >> > happens
>>> >>> >> > here applies to overlayfs only?
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > This if () is basically a check to see if the bug exists. We
>>> look for
>>> >>> >> > all instances of the device
>>> >>> >> > st_dev and see if any of them match our mnt_id. If that's not
>>> the
>>> >>> >> > case,
>>> >>> >> > the bug exists and we want
>>> >>> >> > to proceed to try to fix it.
>>> >>> >>
>>> >>> >> OK
>>> >>> >>
>>> >>> >> > I could have left this as a separate for loop to make this more
>>> >>> >> > clear,
>>> >>> >> > but this way we only use
>>> >>> >> > one for loop for both. I believe it's better if we separate it
>>> into
>>> >>> >> > two
>>> >>> >> > for loops since we prevent
>>> >>> >> > some unnecessary stating. I can also move this part to
>>> >>> >> > lookup_overlayfs,
>>> >>> >> > before dealing with
>>> >>> >> > namespaces. What do you think?
>>> >>> >>
>>> >>> >> Yup, makes sense to me.
>>> >>> >>
>>> >>> >
>>> >>
>>> >>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150805/1325b1a9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: overlayfs.patch
Type: text/x-patch
Size: 8580 bytes
Desc: not available
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150805/1325b1a9/attachment-0001.bin>
More information about the CRIU
mailing list