[CRIU] [SRU][F][G][H][PATCH v2] UBUNTU: SAUCE: overlayfs: fix incorrect mnt_id of files opened from map_files
Christian Brauner
christian.brauner at ubuntu.com
Tue Apr 27 14:27:08 MSK 2021
On Tue, Apr 27, 2021 at 01:47:17PM +0300, Alexander Mihalicyn wrote:
> On Tue, Apr 27, 2021 at 1:43 PM Christian Brauner
> <christian.brauner at ubuntu.com> wrote:
> >
> > On Tue, Apr 27, 2021 at 01:37:28PM +0300, Alexander Mihalicyn wrote:
> > > On Tue, Apr 27, 2021 at 1:28 PM Christian Brauner
> > > <christian.brauner at ubuntu.com> wrote:
> > > >
> > > > On Tue, Apr 27, 2021 at 12:03:17PM +0200, Krzysztof Kozlowski wrote:
> > > > > On 27/04/2021 11:46, Christian Brauner wrote:
> > > > > > On Mon, Apr 26, 2021 at 11:11:21AM +0300, alexander at mihalicyn.com wrote:
> > > > > >> From: Alexander Mikhalitsyn <alexander at mihalicyn.com>
> > > > > >>
> > > > > >> BugLink: https://bugs.launchpad.net/bugs/1857257
> > > > > >>
> > > > > >> The hack was introduced in ("UBUNTU: SAUCE: overlayfs: allow with
> > > > > >> shiftfs as underlay") and it broke checkpoint/restore of docker
> > > > > >> contains:
> > > > > >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1857257
> > > > > >>
> > > > > >> The following script can be used to trigger the issue:
> > > > > >> #!/bin/bash
> > > > > >>
> > > > > >> cat > test.py << EOF
> > > > > >> import sys
> > > > > >>
> > > > > >> f = open("/proc/self/maps")
> > > > > >>
> > > > > >> for l in f.readlines():
> > > > > >> if "python" not in l:
> > > > > >> continue
> > > > > >> print(l)
> > > > > >> s = l.split()
> > > > > >> start, end = s[0].split("-")
> > > > > >> fname = s[-1]
> > > > > >> print(start, end, fname)
> > > > > >> break
> > > > > >> else:
> > > > > >> sys.exit(1)
> > > > > >>
> > > > > >> test_file1 = open(fname)
> > > > > >> test_file2 = open("/proc/self/map_files/%s-%s" % (start, end))
> > > > > >>
> > > > > >> fdinfo1 = open("/proc/self/fdinfo/%d" % test_file1.fileno()).read()
> > > > > >> fdinfo2 = open("/proc/self/fdinfo/%d" % test_file2.fileno()).read()
> > > > > >>
> > > > > >> if fdinfo1 != fdinfo2:
> > > > > >> print("FAIL")
> > > > > >> print(test_file1)
> > > > > >> print(fdinfo1)
> > > > > >> print(test_file2)
> > > > > >> print(fdinfo2)
> > > > > >> sys.exit(1)
> > > > > >> print("PASS")
> > > > > >> EOF
> > > > > >> sudo docker run -it --privileged --rm -v `pwd`:/mnt python python /mnt/test.py
> > > > > >>
> > > > > >> Thanks to Andrei Vagin for the reproducer and investigation of this problem.
> > > > > >>
> > > > > >> Cc: Andrei Vagin <avagin at gmail.com>
> > > > > >> Cc: Adrian Reber <areber at redhat.com>
> > > > > >> Cc: Christian Brauner <christian.brauner at ubuntu.com>
> > > > > >> Cc: Stefan Bader <stefan.bader at canonical.com>
> > > > > >> Cc: Kleber Sacilotto de Souza <kleber.souza at canonical.com>
> > > > > >> Cc: Krzysztof Kozlowski <krzysztof.kozlowski at canonical.com>
> > > > > >>
> > > > > >> Fixes: d24b8a5 ("UBUNTU: SAUCE: overlayfs: allow with shiftfs as underlay")
> > > > > >> Signed-off-by: Alexander Mikhalitsyn <alexander at mihalicyn.com>
> > > > > >> ---
> > > > > >
> > > > > > Hey,
> > > > > >
> > > > > > Thanks for the patch!
> > > > > > Fwiw, Andrei already tried to fix this a while ago but it caused another
> > > > > > regression which forced us to revert the fix for this issue.
> > > > >
> > > > > Did you mean you tried this patch and it caused regressions?
> > > >
> > > > No, we had an earlier fix for this issue which didn't depend on aufs
> > > > that Andrei sent:
> > > >
> > > > commit 9ac77e721a0edb87cbbe95f9e75d8e27d96cb051
> > > > Author: Kleber Sacilotto de Souza <kleber.souza at canonical.com>
> > > > Date: Thu May 21 12:55:43 2020 +0200
> > > >
> > > > Revert "UBUNTU: SAUCE: overlayfs: use shiftfs hacks only with shiftfs as underlay"
> > > >
> > > > BugLink: https://bugs.launchpad.net/bugs/1879690
> > > >
> > > > This reverts commit 6f18a8434050333afc80b5bfce2e0e994c86b790.
> > > >
> > > > The change applied for LP: #1857257 and its followup fix LP: #1876645
> > > > introduced a regression on overlayfs. Revert these commits for now.
> > > >
> > > > Signed-off-by: Kleber Sacilotto de Souza <kleber.souza at canonical.com>
> > > > Acked-by: Colin Ian King <colin.king at canonical.com>
> > > > Acked-by: Khalid Elmously <khalid.elmously at canonical.com>
> > > > Signed-off-by: Kleber Sacilotto de Souza <kleber.souza at canonical.com>
> > > >
> > > > Shifts is trying to overcome a shortcoming in overlayfs itself where
> > > > overlayfs is constructing a fake path where fake path means mnt and
> > > > dentry from the overlay but inode from another (lower or upper)
> > > > filesystem which is a hack to workaround related issues to the one we're
> > > > doing here.
> > > >
> > > > >
> > > > > >
> > > > > >> fs/overlayfs/file.c | 29 +++++++++++++++++++++++++++++
> > > > > >> 1 file changed, 29 insertions(+)
> > > > > >>
> > > > > >> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > > > > >> index 0d3ea0cf3e98..5fa520d0798e 100644
> > > > > >> --- a/fs/overlayfs/file.c
> > > > > >> +++ b/fs/overlayfs/file.c
> > > > > >> @@ -325,6 +325,18 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
> > > > > >> return ret;
> > > > > >> }
> > > > > >>
> > > > > >> +/* handle vma->vm_prfile */
> > > > > >> +static void ovl_vm_prfile_set(struct vm_area_struct *vma,
> > > > > >> + struct file *file)
> > > > > >> +{
> > > > > >> + get_file(file);
> > > > > >> + vma->vm_prfile = file;
> > > > > >> +#ifndef CONFIG_MMU
> > > > > >> + get_file(file);
> > > > > >> + vma->vm_region->vm_prfile = file;
> > > > > >> +#endif
> > > > > >> +}
> > > > > >
> > > > > > I'm confused struct vm_area_struct doesn't have a vm_prfile entry. Can
> > > > > > you explain, please?
> > > > >
> > > > > It has. The aufs patchset brought it and basically this patch depends on
> > > > > aufs.
> > > >
> > > > I meant upstream but good to know.
> > > > Btw, why do we still carry aufs?
> > > > This is not my business but I fear this will mean you're committing to
> > > > carrying either aufs or a specific aufs patch with Ubuntu for the sake
> > > > of overlayfs even if you wanted to drop aufs in the future.
> > >
> > > If the aufs will be dropped from the Ubuntu kernel, this part with
> > > vma_pr_or_file() and additional
> > > vma field can be safely saved in the kernel.
> > > It's an independent part (mm/prfile.c + several small changes in
> > > proc). So, it's safe to keep
> > > it in the kernel for overlayfs/shiftfs needs.
> > >
> > > It's also important to notice that this problem with incorrect
> > > map_files mnt_id is also valid
> > > for current shiftfs implementation. If we decide to take this
> > > particular patch for overlayfs,
> > > I will prepare an analogical fix for the shiftfs.
> >
> > So what's the wrong mnt_id here, i.e. is criu confused because it sees
> > the lower fs's mnt_id when it should see the mnt_id of shiftfs/overlayfs
> > itself or the other way around?
>
> Yes. If we mmap() file which was opened through shiftfs/overlayfs we
> expect that if we reopen the same file but through
> /proc/<pid>/map_files/0x111-0x222
> we get file descriptor from the same mount but not from the underlying
> filesystem. I'm not
> sure but this weird behaviour theoretically may also cause security issues.
Stacking filesystems are crap is the summary you're looking for.
Jokes aside, this is I think the reason for the overlayfs fake-path
quackery, i.e. combining mount and dentry from overlayfs with a
non-overlayfs inode. If your patch doesn't regress anything and we carry
that aufs crap anyway then fine with me. Though I think this needs to be
fixed in the upstream kernel, i.e. overlayfs shouldn't need to do stuff
like this in the first place.
More information about the CRIU
mailing list