[Devel] [PATCH RHEL COMMIT] overlayfs: add dynamic path resolving in mount options
Konstantin Khorenko
khorenko at virtuozzo.com
Mon Oct 4 20:39:04 MSK 2021
The commit is pushed to "branch-rh9-5.14.vz9.1.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git
after ark-5.14
------>
commit 7efc9ed53a3ed76de2e5881f4f2ebfd80d88d504
Author: Alexander Mikhalitsyn <alexander.mikhalitsyn at virtuozzo.com>
Date: Mon Oct 4 20:39:04 2021 +0300
overlayfs: add dynamic path resolving in mount options
This patch adds OVERLAY_FS_DYNAMIC_RESOLVE_PATH_OPTIONS
compile-time option, and "dyn_path_opts" runtime module option.
These options corresponds "dynamic path resolving in lowerdir,
upperdir, workdir mount options" mode. If enabled, user may see
real full paths relatively to the mount namespace in lowerdir,
upperdir, workdir options (/proc/mounts, /proc/<fd>/mountinfo).
This patch is very helpful to checkpoint/restore functionality
of overlayfs mounts. With this patch and CRIU it's real to C/R
Docker containers with overlayfs storage driver.
Note: d_path function from dcache.c is used to resolve full path
in mount namespace. This function also adds "(deleted)" suffix
if dentry was deleted. So, If one of dentries in lowerdir, upperdir,
workdir options is deleted, we will see "(deleted)" suffix in
corresponding path.
https://jira.sw.ru/browse/PSBM-58614
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn at virtuozzo.com>
=====================
Patchset description:
overlayfs: C/R enhancements
This patchset aimed to make C/R of overlayfs mounts with CRIU possible.
We introduce two new overlayfs module options -- dyn_path_opts and
mnt_id_path_opts. If enabled this options allows to see real *full* paths
in lowerdir, workdir, upperdir options, and also mnt_ids for corresponding
paths.
This changes should not break anything because for showing mnt_ids we simply
introduce new show-time mount options. And for paths we simply *always*
provide *full paths* instead of relative path on mountinfo.
BEFORE
overlay on /var/lib/docker/overlay2/XYZ/merged type overlay (rw,relatime,
lowerdir=/var/lib/docker/overlay2/XYZ-init/diff:/var/lib/docker/overlay2/
ABC/diff,upperdir=/var/lib/docker/overlay2/XYZ/diff,workdir=/var/lib/docker
/overlay2/XYZ/work)
none on /sys type sysfs (rw,relatime)
AFTER
overlay on /var/lib/docker/overlay2/XYZ/merged type overlay (rw,relatime,
lowerdir=/var/lib/docker/overlay2/XYZ-init/diff:/var/lib/docker/overlay2/
ABC/diff,upperdir=/var/lib/docker/overlay2/XYZ/diff,workdir=/var/lib/docker
/overlay2/XYZ/work,lowerdir_mnt_id=175:175,upperdir_mnt_id=175)
none on /sys type sysfs (rw,relatime)
Alexander Mikhalitsyn (2):
overlayfs: add dynamic path resolving in mount options
overlayfs: add mnt_id paths options
=====================
Rebase to RHEL8.3 kernel-4.18.0-240.1.1.el8_3 note:
- original patch from vz8 kernel has been dropped (did not apply):
1f701048e75e ("overlayfs: add dynamic path resolving in mount options")
- a patchset developed for mainstream has been appliedi
(it's not accepted in ms yet):
https://lore.kernel.org/lkml/20200604161133.20949-1-alexander.mikhalitsyn@virtuozzo.com/
+++
fs/ovelayfs: Fix crash on overlayfs mount
Kdump kernel fails to load because of crash on mount of overlayfs:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
....
Call Trace:
seq_path+0x64/0xb0
print_paths_option+0x79/0xa0
ovl_show_options+0x3a/0x320
show_mountinfo+0x1ee/0x290
seq_read+0x2f8/0x400
vfs_read+0x9d/0x150
ksys_read+0x4f/0xb0
do_syscall_64+0x5b/0x1a0
This is cause by OOB access of ofs->lowerpaths.
We transfer to print_paths_option() ofs->numlayer as size of ->lowerpaths
array, but it's not.
The correct number of lowerpaths elements is ->numlower in 'struct ovl_entry'.
So move lowerpaths there and use oe->numlower as array size.
mFixes: 17fc61697f73 ("overlayfs: add dynamic path resolving in mount options")
mFixes: 2191d729083d ("overlayfs: add mnt_id paths options")
https://jira.sw.ru/browse/PSBM-123508
Signed-off-by: Andrey Ryabinin <aryabinin at virtuozzo.com>
Reviewed-by: Alexander Mikhalitsyn <alexander.mikhalitsyn at virtuozzo.com>
+++
fs/overlayfs: Fix crash on overlayfs mount
[ 261.403900] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 261.412847] Call Trace:
[ 261.413463] seq_path+0x3c/0xa0
[ 261.414090] print_paths_option+0x8c/0xa0
[ 261.414736] ovl_show_options+0x41/0x320
[ 261.415378] show_mountinfo+0x1df/0x2b0
[ 261.416019] seq_read+0x26e/0x3d0
[ 261.416644] vfs_read+0x89/0x140
[ 261.417269] ksys_read+0x52/0xc0
[ 261.418918] do_syscall_64+0x5b/0x1e0
[ 261.419580] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 261.420256] RIP: 0033:0x7f20b59f28e4
The problem is that we take overlayfs lower layers info not
from root dentry. Non-root dentries can have less layers than
root dentry.
Crash reproducer:
mkdir {lower,upper,work,merged}
touch lower/lower
touch upper/upper
touch lowermnt
touch uppermnt
mount -t overlay overlay -o lowerdir=lower,upperdir=upper,workdir=work merged
mount --bind merged/upper uppermnt
mount --bind merged/lower lowermnt
mFixes: 4267859a0 ("fs/ovelayfs: Fix crash on overlayfs mount")
https://jira.sw.ru/browse/PSBM-129333
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn at virtuozzo.com>
Rebased to vz9:
- Changed several new workbasedir references
(cherry picked from vz8 commit b110e82c785e366851d487cd50a33caa20c7bbb6)
Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko at virtuozzo.com>
---
fs/overlayfs/Kconfig | 31 +++++++++++++++++
fs/overlayfs/overlayfs.h | 4 +++
fs/overlayfs/ovl_entry.h | 6 ++--
fs/overlayfs/super.c | 86 ++++++++++++++++++++++++++++--------------------
fs/overlayfs/util.c | 21 ++++++++++++
5 files changed, 110 insertions(+), 38 deletions(-)
diff --git a/fs/overlayfs/Kconfig b/fs/overlayfs/Kconfig
index dd188c7996b3..be733bcd4c00 100644
--- a/fs/overlayfs/Kconfig
+++ b/fs/overlayfs/Kconfig
@@ -124,3 +124,34 @@ config OVERLAY_FS_METACOPY
that doesn't support this feature will have unexpected results.
If unsure, say N.
+
+config OVERLAY_FS_DYNAMIC_RESOLVE_PATH_OPTIONS
+ bool "Overlayfs: all mount paths options resolves dynamically on options show"
+ default y
+ depends on OVERLAY_FS
+ help
+ This option helps checkpoint/restore of overlayfs mounts.
+ If N selected, old behavior is saved. In this case lowerdir, upperdir,
+ workdir options shows in /proc/fd/mountinfo, /proc/mounts as it given
+ by user on mount. User may specify relative paths in these options, then
+ we couldn't determine from options which full paths correspond these
+ relative paths. Also, after pivot_root syscall these paths (even full)
+ will not rebuild according to root change.
+
+ If this config option is enabled then overlay filesystems lowerdir, upperdir,
+ workdir options paths will dynamically recalculated as full paths in corresponding
+ mount namespaces by default.
+
+ It's also possible to change this behavior on overlayfs module loading or
+ through sysfs (dyn_path_opts parameter).
+
+ Disable this to get a backward compatible with previous kernels configuration,
+ but in this case checkpoint/restore functionality for overlayfs mounts
+ will not work.
+
+ If backward compatibility is not an issue, then it is safe and
+ recommended to say Y here.
+
+ For more information, see Documentation/filesystems/overlayfs.txt
+
+ If unsure, say N.
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 6ec73db4bf9e..d30e097fcea5 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -352,6 +352,10 @@ static inline bool ovl_test_flag(unsigned long flag, struct inode *inode)
return test_bit(flag, &OVL_I(inode)->flags);
}
+void print_path_option(struct seq_file *m, const char *name, struct path *path);
+void print_paths_option(struct seq_file *m, const char *name,
+ struct path *paths, unsigned int num);
+
static inline bool ovl_is_impuredir(struct super_block *sb,
struct dentry *dentry)
{
diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 63efee554f69..be8a035a209f 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -49,13 +49,14 @@ struct ovl_path {
/* private information held for overlayfs's superblock */
struct ovl_fs {
+ struct path upperpath;
unsigned int numlayer;
/* Number of unique fs among layers including upper fs */
unsigned int numfs;
const struct ovl_layer *layers;
struct ovl_sb *fs;
- /* workbasedir is the path at workdir= mount option */
- struct dentry *workbasedir;
+ /* workbasepath is the path at workdir= mount option */
+ struct path workbasepath;
/* workdir is the 'work' directory under workbasedir */
struct dentry *workdir;
/* index directory listing overlay inodes by origin file handle */
@@ -109,6 +110,7 @@ struct ovl_entry {
struct rcu_head rcu;
};
unsigned numlower;
+ struct path *lowerpaths;
struct ovl_path lowerstack[];
};
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 74571ad7ef4f..fdb0d9a45104 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -53,12 +53,20 @@ module_param_named(xino_auto, ovl_xino_auto_def, bool, 0644);
MODULE_PARM_DESC(xino_auto,
"Auto enable xino feature");
+static bool ovl_dyn_path_opts = IS_ENABLED(CONFIG_OVERLAY_FS_DYNAMIC_RESOLVE_PATH_OPTIONS);
+module_param_named(dyn_path_opts, ovl_dyn_path_opts, bool, 0644);
+MODULE_PARM_DESC(dyn_path_opts, "dyn_path_opts feature enabled");
+
static void ovl_entry_stack_free(struct ovl_entry *oe)
{
unsigned int i;
- for (i = 0; i < oe->numlower; i++)
+ for (i = 0; i < oe->numlower; i++) {
dput(oe->lowerstack[i].dentry);
+ if (oe->lowerpaths)
+ path_put(&oe->lowerpaths[i]);
+ }
+ kfree(oe->lowerpaths);
}
static bool ovl_metacopy_def = IS_ENABLED(CONFIG_OVERLAY_FS_METACOPY);
@@ -224,17 +232,20 @@ static void ovl_free_fs(struct ovl_fs *ofs)
dput(ofs->indexdir);
dput(ofs->workdir);
if (ofs->workdir_locked)
- ovl_inuse_unlock(ofs->workbasedir);
- dput(ofs->workbasedir);
+ ovl_inuse_unlock(ofs->workbasepath.dentry);
+ path_put(&ofs->workbasepath);
if (ofs->upperdir_locked)
ovl_inuse_unlock(ovl_upper_mnt(ofs)->mnt_root);
+ path_put(&ofs->upperpath);
+
/* Hack! Reuse ofs->layers as a vfsmount array before freeing it */
mounts = (struct vfsmount **) ofs->layers;
for (i = 0; i < ofs->numlayer; i++) {
iput(ofs->layers[i].trap);
mounts[i] = ofs->layers[i].mnt;
}
+
kern_unmount_array(mounts, ofs->numlayer);
kfree(ofs->layers);
for (i = 0; i < ofs->numfs; i++)
@@ -356,11 +367,20 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
{
struct super_block *sb = dentry->d_sb;
struct ovl_fs *ofs = sb->s_fs_info;
+ struct ovl_entry *oe = OVL_E(sb->s_root);
- seq_show_option(m, "lowerdir", ofs->config.lowerdir);
- if (ofs->config.upperdir) {
- seq_show_option(m, "upperdir", ofs->config.upperdir);
- seq_show_option(m, "workdir", ofs->config.workdir);
+ if (ovl_dyn_path_opts) {
+ print_paths_option(m, "lowerdir", oe->lowerpaths, oe->numlower);
+ if (ofs->config.upperdir) {
+ print_path_option(m, "upperdir", &ofs->upperpath);
+ print_path_option(m, "workdir", &ofs->workbasepath);
+ }
+ } else {
+ seq_show_option(m, "lowerdir", ofs->config.lowerdir);
+ if (ofs->config.upperdir) {
+ seq_show_option(m, "upperdir", ofs->config.upperdir);
+ seq_show_option(m, "workdir", ofs->config.workdir);
+ }
}
if (ofs->config.default_permissions)
seq_puts(m, ",default_permissions");
@@ -753,7 +773,7 @@ static int ovl_parse_opt(char *opt, struct ovl_config *config)
static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
const char *name, bool persist)
{
- struct inode *dir = ofs->workbasedir->d_inode;
+ struct inode *dir = ofs->workbasepath.dentry->d_inode;
struct vfsmount *mnt = ovl_upper_mnt(ofs);
struct dentry *work;
int err;
@@ -761,7 +781,7 @@ static struct dentry *ovl_workdir_create(struct ovl_fs *ofs,
inode_lock_nested(dir, I_MUTEX_PARENT);
retry:
- work = lookup_one_len(name, ofs->workbasedir, strlen(name));
+ work = lookup_one_len(name, ofs->workbasepath.dentry, strlen(name));
if (!IS_ERR(work)) {
struct iattr attr = {
@@ -1332,7 +1352,7 @@ static struct dentry *ovl_lookup_or_create(struct dentry *parent,
static int ovl_create_volatile_dirty(struct ovl_fs *ofs)
{
unsigned int ctr;
- struct dentry *d = dget(ofs->workbasedir);
+ struct dentry *d = dget(ofs->workbasepath.dentry);
static const char *const volatile_path[] = {
OVL_WORKDIR_NAME, "incompat", "volatile", "dirty"
};
@@ -1348,7 +1368,7 @@ static int ovl_create_volatile_dirty(struct ovl_fs *ofs)
}
static int ovl_make_workdir(struct super_block *sb, struct ovl_fs *ofs,
- struct path *workpath)
+ struct path *workbasepath)
{
struct vfsmount *mnt = ovl_upper_mnt(ofs);
struct dentry *temp, *workdir;
@@ -1378,7 +1398,7 @@ static int ovl_make_workdir(struct super_block *sb, struct ovl_fs *ofs,
* workdir. This check requires successful creation of workdir in
* previous step.
*/
- err = ovl_check_d_type_supported(workpath);
+ err = ovl_check_d_type_supported(workbasepath);
if (err < 0)
goto out;
@@ -1477,25 +1497,22 @@ static int ovl_get_workdir(struct super_block *sb, struct ovl_fs *ofs,
struct path *upperpath)
{
int err;
- struct path workpath = { };
- err = ovl_mount_dir(ofs->config.workdir, &workpath);
+ err = ovl_mount_dir(ofs->config.workdir, &ofs->workbasepath);
if (err)
goto out;
err = -EINVAL;
- if (upperpath->mnt != workpath.mnt) {
+ if (upperpath->mnt != ofs->workbasepath.mnt) {
pr_err("workdir and upperdir must reside under the same mount\n");
goto out;
}
- if (!ovl_workdir_ok(workpath.dentry, upperpath->dentry)) {
+ if (!ovl_workdir_ok(ofs->workbasepath.dentry, upperpath->dentry)) {
pr_err("workdir and upperdir must be separate subtrees\n");
goto out;
}
- ofs->workbasedir = dget(workpath.dentry);
-
- if (ovl_inuse_trylock(ofs->workbasedir)) {
+ if (ovl_inuse_trylock(ofs->workbasepath.dentry)) {
ofs->workdir_locked = true;
} else {
err = ovl_report_in_use(ofs, "workdir");
@@ -1503,15 +1520,14 @@ static int ovl_get_workdir(struct super_block *sb, struct ovl_fs *ofs,
goto out;
}
- err = ovl_setup_trap(sb, ofs->workbasedir, &ofs->workbasedir_trap,
+ err = ovl_setup_trap(sb, ofs->workbasepath.dentry, &ofs->workbasedir_trap,
"workdir");
if (err)
goto out;
- err = ovl_make_workdir(sb, ofs, &workpath);
+ err = ovl_make_workdir(sb, ofs, &ofs->workbasepath);
out:
- path_put(&workpath);
return err;
}
@@ -1835,15 +1851,17 @@ static struct ovl_entry *ovl_get_lowerstack(struct super_block *sb,
oe->lowerstack[i].dentry = dget(stack[i].dentry);
oe->lowerstack[i].layer = &ofs->layers[i+1];
}
+ oe->lowerpaths = stack;
out:
- for (i = 0; i < numlower; i++)
- path_put(&stack[i]);
- kfree(stack);
-
return oe;
out_err:
+ if (stack) {
+ for (i = 0; i < numlower; i++)
+ path_put(&stack[i]);
+ kfree(stack);
+ }
oe = ERR_PTR(err);
goto out;
}
@@ -1904,8 +1922,7 @@ static int ovl_check_overlapping_layers(struct super_block *sb,
* workbasedir. In that case, we already have their traps in
* inode cache and we will catch that case on lookup.
*/
- err = ovl_check_layer(sb, ofs, ofs->workbasedir, "workdir",
- false);
+ err = ovl_check_layer(sb, ofs, ofs->workbasepath.dentry, "workdir", false);
if (err)
return err;
}
@@ -1930,7 +1947,7 @@ static struct dentry *ovl_get_root(struct super_block *sb,
unsigned long ino = d_inode(lowerpath->dentry)->i_ino;
int fsid = lowerpath->layer->fsid;
struct ovl_inode_params oip = {
- .upperdentry = upperdentry,
+ .upperdentry = dget(upperdentry),
.lowerpath = lowerpath,
};
@@ -1961,7 +1978,6 @@ static struct dentry *ovl_get_root(struct super_block *sb,
static int ovl_fill_super(struct super_block *sb, void *data, int silent)
{
- struct path upperpath = { };
struct dentry *root_dentry;
struct ovl_entry *oe;
struct ovl_fs *ofs;
@@ -2052,7 +2068,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
goto out_err;
}
- err = ovl_get_upper(sb, ofs, &layers[0], &upperpath);
+ err = ovl_get_upper(sb, ofs, &layers[0], &ofs->upperpath);
if (err)
goto out_err;
@@ -2066,7 +2082,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
}
}
- err = ovl_get_workdir(sb, ofs, &upperpath);
+ err = ovl_get_workdir(sb, ofs, &ofs->upperpath);
if (err)
goto out_err;
@@ -2091,7 +2107,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
}
if (!ovl_force_readonly(ofs) && ofs->config.index) {
- err = ovl_get_indexdir(sb, ofs, oe, &upperpath);
+ err = ovl_get_indexdir(sb, ofs, oe, &ofs->upperpath);
if (err)
goto out_free_oe;
@@ -2132,11 +2148,10 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
sb->s_iflags |= SB_I_SKIP_SYNC;
err = -ENOMEM;
- root_dentry = ovl_get_root(sb, upperpath.dentry, oe);
+ root_dentry = ovl_get_root(sb, ofs->upperpath.dentry, oe);
if (!root_dentry)
goto out_free_oe;
- mntput(upperpath.mnt);
kfree(splitlower);
sb->s_root = root_dentry;
@@ -2148,7 +2163,6 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
kfree(oe);
out_err:
kfree(splitlower);
- path_put(&upperpath);
ovl_free_fs(ofs);
out:
return err;
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index b9d03627f364..29cf1947fd00 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -13,6 +13,7 @@
#include <linux/uuid.h>
#include <linux/namei.h>
#include <linux/ratelimit.h>
+#include <linux/seq_file.h>
#include "overlayfs.h"
int ovl_want_write(struct dentry *dentry)
@@ -976,3 +977,23 @@ int ovl_sync_status(struct ovl_fs *ofs)
return errseq_check(&mnt->mnt_sb->s_wb_err, ofs->errseq);
}
+
+void print_path_option(struct seq_file *m, const char *name, struct path *path)
+{
+ seq_show_option(m, name, "");
+ seq_path(m, path, ", \t\n\\");
+}
+
+void print_paths_option(struct seq_file *m, const char *name,
+ struct path *paths, unsigned int num)
+{
+ int i;
+
+ seq_show_option(m, name, "");
+
+ for (i = 0; i < num; i++) {
+ if (i)
+ seq_putc(m, ':');
+ seq_path(m, &paths[i], ", \t\n\\");
+ }
+}
More information about the Devel
mailing list