<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=koi8-r">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I did look into that matter, and here is my thoughts.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
So far if inode ever reach i_count == 0, iput will call iput_final,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
which will call <span style="color: rgb(0, 0, 0); font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;">drop function sb->op->drop_inode, </span><span style="color: rgb(0, 0, 0); font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;">set
I_FREEING status</span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="color: rgb(0, 0, 0); font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;">and then call evict. I don't really think any sane code</span></div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="color: rgb(0, 0, 0); font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt;">will ever increment i_count back.</span></div>
<div id="appendonsend"></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>От:</b> Pavel Tikhomirov <ptikhomirov@virtuozzo.com><br>
<b>Отправлено:</b> 29 июля 2020 г. 13:46<br>
<b>Кому:</b> Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com>; devel@openvz.org <devel@openvz.org><br>
<b>Тема:</b> Re: [PATCH RH7 v3] cgroup: add export_operations to cgroup super block</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt">
<div class="PlainText">Except for two small nits it looks good.<br>
<br>
Note: Because we use atomic_inc_not_zero(i_count) and the fact that iput <br>
can increment i_count back, we can _potentially_ see situation that <br>
open_by_handle_at first fails to open and next is able to open the same <br>
handle. But I don't think it's a big deal for us.<br>
<br>
On 7/28/20 2:59 PM, Andrey Zhadchenko wrote:<br>
> criu uses fhandle from fdinfo to dump inotify objects. cgroup super block has<br>
> no export operations, but .encode_fh and .fh_to_dentry are needed for<br>
> inotify_fdinfo function and open_by_handle_at syscall in order to correctly<br>
> open files located on cgroupfs by fhandle.<br>
> Add hash table as a storage for inodes with exported fhandle.<br>
> <br>
> v3: use inode->i_gen to protect from i_ino reusage. increase fhandle size to<br>
> 2 * u32.<br>
> Add an option to take reference of inode in cgroup_find_inode, so no one can<br>
> delete recently found inode.<br>
> <br>
> <a href="https://jira.sw.ru/browse/PSBM-105889">https://jira.sw.ru/browse/PSBM-105889</a><br>
> Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com><br>
> ---<br>
> kernel/cgroup.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-<br>
> 1 file changed, 136 insertions(+), 1 deletion(-)<br>
> <br>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c<br>
> index 9fdba79..267a5a4 100644<br>
> --- a/kernel/cgroup.c<br>
> +++ b/kernel/cgroup.c<br>
> @@ -62,6 +62,8 @@<br>
> #include <linux/kthread.h><br>
> #include <linux/ve.h><br>
> #include <linux/stacktrace.h><br>
> +#include <linux/exportfs.h><br>
> +#include <linux/time.h><br>
> <br>
> #include <linux/atomic.h><br>
> <br>
> @@ -765,6 +767,7 @@ static struct inode *cgroup_new_inode(umode_t mode, struct super_block *sb)<br>
> <br>
> if (inode) {<br>
> inode->i_ino = get_next_ino();<br>
> + inode->i_generation = get_seconds();<br>
<br>
We likely want to use prandom_u32() instead, see mainstream linux commit <br>
46c9a946d766 ("shmem: use monotonic time for i_generation"), they've <br>
change it for shmem to overcome overflow.<br>
<br>
> inode->i_mode = mode;<br>
> inode->i_uid = current_fsuid();<br>
> inode->i_gid = current_fsgid();<br>
> @@ -1390,9 +1393,140 @@ out:<br>
> }<br>
> #endif<br>
> <br>
> +/*<br>
> + * hashtable for inodes that have exported fhandles.<br>
> + * When we export fhandle, we add it's inode into<br>
> + * hashtable so we can find it fast<br>
> + */<br>
> +<br>
> +#define CGROUP_INODE_HASH_BITS 10<br>
> +static DEFINE_HASHTABLE(cgroup_inode_table, CGROUP_INODE_HASH_BITS);<br>
> +static DEFINE_SPINLOCK(cgroup_inode_table_lock);<br>
> +<br>
> +struct cg_inode_hitem {<br>
> + struct inode *inode;<br>
> + struct hlist_node hlist;<br>
> +};<br>
> +<br>
> +static inline unsigned long cgroup_inode_get_hash(unsigned int i_ino)<br>
> +{<br>
> + return hash_32(i_ino, CGROUP_INODE_HASH_BITS);<br>
> +}<br>
> +<br>
> +static struct cg_inode_hitem *cgroup_find_inode(unsigned long fhandle[2],<br>
> + char take_ref)<br>
> +{<br>
> + struct cg_inode_hitem *i;<br>
> + struct hlist_head *head = cgroup_inode_table<br>
> + + cgroup_inode_get_hash(fhandle[1]);<br>
> + struct cg_inode_hitem *found = NULL;<br>
> +<br>
> + spin_lock(&cgroup_inode_table_lock);<br>
> + hlist_for_each_entry(i, head, hlist) {<br>
> + if (i->inode->i_generation == fhandle[0] &&<br>
> + i->inode->i_ino == fhandle[1]) {<br>
> + /*<br>
> + * If we need to increase refcount, we should be aware<br>
> + * of possible deadlock. Another thread may have<br>
> + * started deleting this inode: iput->iput_final-><br>
> + * ->cgroup_delete_inode->cgroup_find_inode. if we<br>
> + * just call igrab, it will try to take i_lock and<br>
> + * this will result in deadlock, because deleting<br>
> + * thread has already taken it and waits on<br>
> + * cgroup_inode_table_lock to find inode in hashtable.<br>
> + * If i_count is zero, someone is deleting it -> skip.<br>
> + */<br>
> + if (take_ref)<br>
> + if (!atomic_inc_not_zero(&i->inode->i_count))<br>
> + continue;<br>
<br>
Should be break instead of continue? There can't be another entry in <br>
hash with same i_ino and i_generation.<br>
<br>
> + found = i;<br>
> + break;<br>
> + }<br>
> + }<br>
> + spin_unlock(&cgroup_inode_table_lock);<br>
> +<br>
> + return found;<br>
> +}<br>
> +<br>
> +static struct dentry *cgroup_fh_to_dentry(struct super_block *sb,<br>
> + struct fid *fid, int fh_len, int fh_type)<br>
> +{<br>
> + struct cg_inode_hitem *item;<br>
> + struct dentry *dentry = ERR_PTR(-ENOENT);<br>
> + unsigned long fhandle[2] = {fid->raw[0], fid->raw[1]};<br>
> +<br>
> + if (fh_len < 2)<br>
> + return NULL;<br>
> +<br>
> + item = cgroup_find_inode(fhandle, 1);<br>
> + if (item) {<br>
> + dentry = d_find_alias(item->inode);<br>
> + iput(item->inode);<br>
> + }<br>
> + return dentry;<br>
> +}<br>
> +<br>
> +static int cgroup_encode_fh(struct inode *inode, __u32 *fh, int *len,<br>
> + struct inode *parent)<br>
> +{<br>
> + struct cg_inode_hitem *item;<br>
> + struct hlist_head *head = cgroup_inode_table<br>
> + + cgroup_inode_get_hash(inode->i_ino);<br>
> + unsigned long fhandle[2] = {inode->i_generation, inode->i_ino};<br>
> +<br>
> + if (*len < 2) {<br>
> + *len = 2;<br>
> + return FILEID_INVALID;<br>
> + }<br>
> +<br>
> + if (!cgroup_find_inode(fhandle, 0)) {<br>
> + item = kmalloc(sizeof(struct cg_inode_hitem),<br>
> + GFP_KERNEL);<br>
> + /*<br>
> + * encode_fh is expected to return 255 (FILEID_INVALID)<br>
> + * in case of failure. We can't return ENOMEM, so<br>
> + * return FILEID_INVALID at least<br>
> + */<br>
> + if (!item)<br>
> + return FILEID_INVALID;<br>
> + item->inode = inode;<br>
> +<br>
> + spin_lock(&cgroup_inode_table_lock);<br>
> + hlist_add_head(&item->hlist, head);<br>
> + spin_unlock(&cgroup_inode_table_lock);<br>
> + }<br>
> +<br>
> + fh[0] = fhandle[0];<br>
> + fh[1] = fhandle[1];<br>
> + *len = 2;<br>
> + return 1;<br>
> +}<br>
> +<br>
> +static const struct export_operations cgroup_export_ops = {<br>
> + .encode_fh = cgroup_encode_fh,<br>
> + .fh_to_dentry = cgroup_fh_to_dentry,<br>
> +};<br>
> +<br>
> +static int cgroup_delete_inode(struct inode *inode)<br>
> +{<br>
> + struct cg_inode_hitem *item = NULL;<br>
> + unsigned long fhandle[2] = {inode->i_generation, inode->i_ino};<br>
> +<br>
> + item = cgroup_find_inode(fhandle, 0);<br>
> + if (item) {<br>
> + spin_lock(&cgroup_inode_table_lock);<br>
> + hlist_del(&item->hlist);<br>
> + spin_unlock(&cgroup_inode_table_lock);<br>
> +<br>
> + kfree(item);<br>
> + }<br>
> +<br>
> + return generic_delete_inode(inode);<br>
> +}<br>
> +<br>
> static const struct super_operations cgroup_ops = {<br>
> .statfs = simple_statfs,<br>
> - .drop_inode = generic_delete_inode,<br>
> + .drop_inode = cgroup_delete_inode,<br>
> .show_options = cgroup_show_options,<br>
> #ifdef CONFIG_VE<br>
> .show_path = cgroup_show_path,<br>
> @@ -1539,6 +1673,7 @@ static int cgroup_set_super(struct super_block *sb, void *data)<br>
> sb->s_blocksize_bits = PAGE_CACHE_SHIFT;<br>
> sb->s_magic = CGROUP_SUPER_MAGIC;<br>
> sb->s_op = &cgroup_ops;<br>
> + sb->s_export_op = &cgroup_export_ops;<br>
> <br>
> return 0;<br>
> }<br>
> <br>
<br>
-- <br>
Best regards, Tikhomirov Pavel<br>
Software Developer, Virtuozzo.<br>
</div>
</span></font></div>
</body>
</html>