[CRIU] [SRU][F][G][H][PATCH v2] UBUNTU: SAUCE: overlayfs: fix incorrect mnt_id of files opened from map_files
Alexander Mihalicyn
alexander at mihalicyn.com
Tue Apr 27 14:46:21 MSK 2021
On Tue, Apr 27, 2021 at 2:31 PM Christian Brauner
<christian.brauner at ubuntu.com> wrote:
>
> On Tue, Apr 27, 2021 at 01:28:47PM +0200, Christian Brauner wrote:
> > On Tue, Apr 27, 2021 at 01:27:14PM +0200, Christian Brauner wrote:
> > > On Tue, Apr 27, 2021 at 01:47:17PM +0300, Alexander Mihalicyn wrote:
> > > > On Tue, Apr 27, 2021 at 1:43 PM Christian Brauner
> > > > <christian.brauner at ubuntu.com> wrote:
> > > > >
> > > > > On Tue, Apr 27, 2021 at 01:37:28PM +0300, Alexander Mihalicyn wrote:
> > > > > > On Tue, Apr 27, 2021 at 1:28 PM Christian Brauner
> > > > > > <christian.brauner at ubuntu.com> wrote:
> > > > > > >
> > > > > > > On Tue, Apr 27, 2021 at 12:03:17PM +0200, Krzysztof Kozlowski wrote:
> > > > > > > > On 27/04/2021 11:46, Christian Brauner wrote:
> > > > > > > > > On Mon, Apr 26, 2021 at 11:11:21AM +0300, alexander at mihalicyn.com wrote:
> > > > > > > > >> From: Alexander Mikhalitsyn <alexander at mihalicyn.com>
> > > > > > > > >>
> > > > > > > > >> BugLink: https://bugs.launchpad.net/bugs/1857257
> > > > > > > > >>
> > > > > > > > >> The hack was introduced in ("UBUNTU: SAUCE: overlayfs: allow with
> > > > > > > > >> shiftfs as underlay") and it broke checkpoint/restore of docker
> > > > > > > > >> contains:
> > > > > > > > >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857257
> > > > > > > > >>
> > > > > > > > >> The following script can be used to trigger the issue:
> > > > > > > > >> #!/bin/bash
> > > > > > > > >>
> > > > > > > > >> cat > test.py << EOF
> > > > > > > > >> import sys
> > > > > > > > >>
> > > > > > > > >> f = open("/proc/self/maps")
> > > > > > > > >>
> > > > > > > > >> for l in f.readlines():
> > > > > > > > >> if "python" not in l:
> > > > > > > > >> continue
> > > > > > > > >> print(l)
> > > > > > > > >> s = l.split()
> > > > > > > > >> start, end = s[0].split("-")
> > > > > > > > >> fname = s[-1]
> > > > > > > > >> print(start, end, fname)
> > > > > > > > >> break
> > > > > > > > >> else:
> > > > > > > > >> sys.exit(1)
> > > > > > > > >>
> > > > > > > > >> test_file1 = open(fname)
> > > > > > > > >> test_file2 = open("/proc/self/map_files/%s-%s" % (start, end))
> > > > > > > > >>
> > > > > > > > >> fdinfo1 = open("/proc/self/fdinfo/%d" % test_file1.fileno()).read()
> > > > > > > > >> fdinfo2 = open("/proc/self/fdinfo/%d" % test_file2.fileno()).read()
> > > > > > > > >>
> > > > > > > > >> if fdinfo1 != fdinfo2:
> > > > > > > > >> print("FAIL")
> > > > > > > > >> print(test_file1)
> > > > > > > > >> print(fdinfo1)
> > > > > > > > >> print(test_file2)
> > > > > > > > >> print(fdinfo2)
> > > > > > > > >> sys.exit(1)
> > > > > > > > >> print("PASS")
> > > > > > > > >> EOF
> > > > > > > > >> sudo docker run -it --privileged --rm -v `pwd`:/mnt python python /mnt/test.py
> > > > > > > > >>
> > > > > > > > >> Thanks to Andrei Vagin for the reproducer and investigation of this problem.
> > > > > > > > >>
> > > > > > > > >> Cc: Andrei Vagin <avagin at gmail.com>
> > > > > > > > >> Cc: Adrian Reber <areber at redhat.com>
> > > > > > > > >> Cc: Christian Brauner <christian.brauner at ubuntu.com>
> > > > > > > > >> Cc: Stefan Bader <stefan.bader at canonical.com>
> > > > > > > > >> Cc: Kleber Sacilotto de Souza <kleber.souza at canonical.com>
> > > > > > > > >> Cc: Krzysztof Kozlowski <krzysztof.kozlowski at canonical.com>
> > > > > > > > >>
> > > > > > > > >> Fixes: d24b8a5 ("UBUNTU: SAUCE: overlayfs: allow with shiftfs as underlay")
> > > > > > > > >> Signed-off-by: Alexander Mikhalitsyn <alexander at mihalicyn.com>
> > > > > > > > >> ---
> > > > > > > > >
> > > > > > > > > Hey,
> > > > > > > > >
> > > > > > > > > Thanks for the patch!
> > > > > > > > > Fwiw, Andrei already tried to fix this a while ago but it caused another
> > > > > > > > > regression which forced us to revert the fix for this issue.
> > > > > > > >
> > > > > > > > Did you mean you tried this patch and it caused regressions?
> > > > > > >
> > > > > > > No, we had an earlier fix for this issue which didn't depend on aufs
> > > > > > > that Andrei sent:
> > > > > > >
> > > > > > > commit 9ac77e721a0edb87cbbe95f9e75d8e27d96cb051
> > > > > > > Author: Kleber Sacilotto de Souza <kleber.souza at canonical.com>
> > > > > > > Date: Thu May 21 12:55:43 2020 +0200
> > > > > > >
> > > > > > > Revert "UBUNTU: SAUCE: overlayfs: use shiftfs hacks only with shiftfs as underlay"
> > > > > > >
> > > > > > > BugLink: https://bugs.launchpad.net/bugs/1879690
> > > > > > >
> > > > > > > This reverts commit 6f18a8434050333afc80b5bfce2e0e994c86b790.
> > > > > > >
> > > > > > > The change applied for LP: #1857257 and its followup fix LP: #1876645
> > > > > > > introduced a regression on overlayfs. Revert these commits for now.
> > > > > > >
> > > > > > > Signed-off-by: Kleber Sacilotto de Souza <kleber.souza at canonical.com>
> > > > > > > Acked-by: Colin Ian King <colin.king at canonical.com>
> > > > > > > Acked-by: Khalid Elmously <khalid.elmously at canonical.com>
> > > > > > > Signed-off-by: Kleber Sacilotto de Souza <kleber.souza at canonical.com>
> > > > > > >
> > > > > > > Shifts is trying to overcome a shortcoming in overlayfs itself where
> > > > > > > overlayfs is constructing a fake path where fake path means mnt and
> > > > > > > dentry from the overlay but inode from another (lower or upper)
> > > > > > > filesystem which is a hack to workaround related issues to the one we're
> > > > > > > doing here.
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > >> fs/overlayfs/file.c | 29 +++++++++++++++++++++++++++++
> > > > > > > > >> 1 file changed, 29 insertions(+)
> > > > > > > > >>
> > > > > > > > >> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > > > > > > > >> index 0d3ea0cf3e98..5fa520d0798e 100644
> > > > > > > > >> --- a/fs/overlayfs/file.c
> > > > > > > > >> +++ b/fs/overlayfs/file.c
> > > > > > > > >> @@ -325,6 +325,18 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
> > > > > > > > >> return ret;
> > > > > > > > >> }
> > > > > > > > >>
> > > > > > > > >> +/* handle vma->vm_prfile */
> > > > > > > > >> +static void ovl_vm_prfile_set(struct vm_area_struct *vma,
> > > > > > > > >> + struct file *file)
> > > > > > > > >> +{
> > > > > > > > >> + get_file(file);
> > > > > > > > >> + vma->vm_prfile = file;
> > > > > > > > >> +#ifndef CONFIG_MMU
> > > > > > > > >> + get_file(file);
> > > > > > > > >> + vma->vm_region->vm_prfile = file;
> > > > > > > > >> +#endif
> > > > > > > > >> +}
> > > > > > > > >
> > > > > > > > > I'm confused struct vm_area_struct doesn't have a vm_prfile entry. Can
> > > > > > > > > you explain, please?
> > > > > > > >
> > > > > > > > It has. The aufs patchset brought it and basically this patch depends on
> > > > > > > > aufs.
> > > > > > >
> > > > > > > I meant upstream but good to know.
> > > > > > > Btw, why do we still carry aufs?
> > > > > > > This is not my business but I fear this will mean you're committing to
> > > > > > > carrying either aufs or a specific aufs patch with Ubuntu for the sake
> > > > > > > of overlayfs even if you wanted to drop aufs in the future.
> > > > > >
> > > > > > If the aufs will be dropped from the Ubuntu kernel, this part with
> > > > > > vma_pr_or_file() and additional
> > > > > > vma field can be safely saved in the kernel.
> > > > > > It's an independent part (mm/prfile.c + several small changes in
> > > > > > proc). So, it's safe to keep
> > > > > > it in the kernel for overlayfs/shiftfs needs.
> > > > > >
> > > > > > It's also important to notice that this problem with incorrect
> > > > > > map_files mnt_id is also valid
> > > > > > for current shiftfs implementation. If we decide to take this
> > > > > > particular patch for overlayfs,
> > > > > > I will prepare an analogical fix for the shiftfs.
> > > > >
> > > > > So what's the wrong mnt_id here, i.e. is criu confused because it sees
> > > > > the lower fs's mnt_id when it should see the mnt_id of shiftfs/overlayfs
> > > > > itself or the other way around?
> > > >
> > > > Yes. If we mmap() file which was opened through shiftfs/overlayfs we
> > > > expect that if we reopen the same file but through
> > > > /proc/<pid>/map_files/0x111-0x222
> > > > we get file descriptor from the same mount but not from the underlying
> > > > filesystem. I'm not
> > > > sure but this weird behaviour theoretically may also cause security issues.
> > >
> > > Stacking filesystems are crap is the summary you're looking for.
> > >
> > > Jokes aside, this is I think the reason for the overlayfs fake-path
> > > quackery, i.e. combining mount and dentry from overlayfs with a
> > > non-overlayfs inode. If your patch doesn't regress anything and we carry
> > > that aufs crap anyway then fine with me. Though I think this needs to be
> > > fixed in the upstream kernel, i.e. overlayfs shouldn't need to do stuff
> > > like this in the first place.
Yes, I think the reason why we have
realfile = open_with_fake_path(&file->f_path, flags, realinode, current_cred());
in ovl_open_realfile() in upstream version of the overlayfs
is to make things like /proc/<pid>/map_files/ work correctly. But
speaking honestly,
I think that the way that the aufs solves this problem by adding
additional field on struct vm_area_struct
looks clearer. When working with stacking filesystem we really have
two files - one for the mm needs and the second
to determine the real origin of the memory mapping (from which fd it
was mmapped).
Maybe we need just port that part of the aufs (mm/prfile.c) for the
upstream kernel? I think it's worth to come up
with a discussion on the unionfs mailing list. I believe that Amir and
Miklos will say something interesting about that problem.
> >
> > Oh, and we totally missed Seth here. Adding him.
>
> And thank you for working on this fix, Alexander.
Thank you for your attention to the problem and patches! ;)
More information about the CRIU
mailing list