[CRIU] [PATCH] cgroups: Rework "devices" controller properties restoration

Tycho Andersen tycho.andersen at canonical.com
Mon Sep 12 07:59:32 PDT 2016


Hi Cyrill,

Sorry for the wait :)

On Thu, Sep 01, 2016 at 04:41:26PM +0300, Cyrill Gorcunov wrote:
> Currently if there are several subcgroups in devices container
> we're trying to deny any device by default first before we're
> setting up allowed device from the image. That's is prohibited
> by kernel because if the cgroup has an active parent noone can
> add a rule to deny devices. The logic in kernel is simple: once
> devices are denied from top level, all descendant cgroups are
> automatically propagated with such limitation.
> 
> Thus what we need is to setup "deny" rule on toplevel cgroup.
> Thus here is a proposal
> 
>  - in restore_special_props pass a level number so we won't try
>    to deny devices on children which already have this rule
>  - write global deny rule _iif_ it's a fresh cgroup created,
>    if cgroup is already existing we assume the user already
>    configured it

I think this is still wrong, because if we have the situation of:

/foo/devices.list: c *:* m
/foo/bar/devices.list: c 1:3 m

this will restore it incorrectly; we need to write a deny rule, and
then only allow 'c 1:3 m' in /foo/bar. If we don't write a deny, we'll
end up with both 'c *:* m' and 'c 1:3 m':

root at hopstrocity:/sys/fs/cgroup/devices/foo# cat devices.list 
c *:* m
root at hopstrocity:/sys/fs/cgroup/devices/foo# mkdir bar
root at hopstrocity:/sys/fs/cgroup/devices/foo# cd bar/
root at hopstrocity:/sys/fs/cgroup/devices/foo/bar# echo a > devices.deny 
root at hopstrocity:/sys/fs/cgroup/devices/foo/bar# echo 'c 1:3 m' > devices.allow 
root at hopstrocity:/sys/fs/cgroup/devices/foo/bar# cat devices.list 
c 1:3 m
root at hopstrocity:/sys/fs/cgroup/devices/foo/bar# cd ..
root at hopstrocity:/sys/fs/cgroup/devices/foo# rmdir bar
root at hopstrocity:/sys/fs/cgroup/devices/foo# mkdir bar
root at hopstrocity:/sys/fs/cgroup/devices/foo# cd bar/
root at hopstrocity:/sys/fs/cgroup/devices/foo/bar# echo 'c 1:3 m' > devices.allow 
root at hopstrocity:/sys/fs/cgroup/devices/foo/bar# cat devices.list 
c *:* m
c 1:3 m

So I think we have to write the 'a' in devices.deny all the time to
get the right result. I suppose we could do some smart checking to
figure out whether the parent is the same as the current cgroup to
avoid writing anything at all, but that's the only case where we don't
need to write it.

I think the right thing to do is in the patchset I sent. The reason the test
doesn't try to restore it is because the cgroup already exists
(because we don't have a way to delete it in the test harness right
now). I'll see about sending a patch for that in the series I sent as
well.

Tycho

> Signed-off-by: Cyrill Gorcunov <gorcunov at virtuozzo.com>
> ---
> 
> Guys, take a look please, this is on top of our vz7 CRIU but
> I'll portforward it upon review.
> 
>  criu/cgroup.c | 86 ++++++++++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 62 insertions(+), 24 deletions(-)
> 
> diff --git a/criu/cgroup.c b/criu/cgroup.c
> index e316197..7517009 100644
> --- a/criu/cgroup.c
> +++ b/criu/cgroup.c
> @@ -1319,14 +1319,18 @@ static int prepare_cgroup_dir_properties(char *path, int off, CgroupDirEntry **e
>  				if (special)
>  					continue;
>  
> -				/* The devices cgroup must be restored in a
> +				/*
> +				 * The devices cgroup must be restored in a
>  				 * special way: only the contents of
>  				 * devices.list can be read, and it is a
>  				 * whitelist of all the devices the cgroup is
> -				 * allowed to create. To re-creat this
> -				 * whitelist, we first deny everything via
> -				 * devices.deny, and then write the list back
> -				 * into devices.allow.
> +				 * allowed to create.
> +				 *
> +				 * If the cgroup already exist then we assume
> +				 * the default "deny" strategy already set by
> +				 * a caller, if it doesn't then our "special"
> +				 * properties handling routine will deny
> +				 * everything on toplevel cgroup.
>  				 *
>  				 * Further, we must have a write() call for
>  				 * each line, because the kernel only parses
> @@ -1346,22 +1350,7 @@ static int prepare_cgroup_dir_properties(char *path, int off, CgroupDirEntry **e
>  					 * this stuff.
>  					 */
>  					pe->perms->mode = 0200;
> -
> -					pe->name = "devices.deny";
> -					pe->value = "a";
> -					ret = restore_cgroup_prop(e->properties[j], path, off2);
> -					pe->name = old_name;
> -					pe->value = old_val;
> -
> -					if (ret < 0)
> -						return -1;
> -
> -					pe->name = xstrdup("devices.allow");
> -					if (!pe->name) {
> -						pe->name = old_name;
> -						return -1;
> -					}
> -					xfree(old_name);
> +					pe->name = "devices.allow";
>  
>  					pos = pe->value;
>  					while (*pos) {
> @@ -1374,6 +1363,7 @@ static int prepare_cgroup_dir_properties(char *path, int off, CgroupDirEntry **e
>  						}
>  						pos += offset;
>  					}
> +					pe->name = old_name;
>  					pe->value = old_val;
>  
>  				} else if (restore_cgroup_prop(e->properties[j], path, off2) < 0) {
> @@ -1411,12 +1401,58 @@ int prepare_cgroup_properties(void)
>  	return 0;
>  }
>  
> -static int restore_special_props(char *paux, size_t off, CgroupDirEntry *e)
> +static int restore_special_props(char *paux, size_t off, CgroupDirEntry *e, bool toplevel)
>  {
>  	int i, j;
>  
>  	pr_info("Restore special props\n");
>  
> +	/*
> +	 * For freshly created toplevel cgroup we need to
> +	 * take care of devices list separately: deny everything
> +	 * by default and setup allowed devices. Upon children
> +	 * creation the list get propagated.
> +	 */
> +	if (toplevel) {
> +		for (j = 0; j < e->n_properties; j++) {
> +			CgroupPropEntry *prop = e->properties[j];
> +			char *old_val, *old_name, *pos;
> +			int ret;
> +
> +			if (strcmp(prop->name, "device.list"))
> +				continue;
> +
> +			prop->perms->mode = 0200;
> +
> +			old_val = prop->value;
> +			old_name = prop->name;
> +
> +			prop->name = "devices.deny";
> +			prop->value = "a";
> +			ret = restore_cgroup_prop(prop, paux, off);
> +			prop->value = old_val;
> +
> +			if (ret < 0) {
> +				prop->name = old_name;
> +				return -1;
> +			}
> +
> +			prop->name = "devices.allow";
> +			pos = prop->value;
> +			while (*pos) {
> +				int offset = next_device_entry(pos);
> +				prop->value = pos;
> +				ret = restore_cgroup_prop(prop, paux, off);
> +				if (ret < 0) {
> +					prop->value = old_val;
> +					return -1;
> +				}
> +				pos += offset;
> +			}
> +			prop->value = old_val;
> +		}
> +	}
> +
>  	for (i = 0; special_props[i]; i++) {
>  		const char *name = special_props[i];
>  
> @@ -1490,8 +1526,10 @@ static int prepare_cgroup_dirs(char **controllers, int n_controllers, char *paux
>  				return -1;
>  
>  			for (j = 0; j < n_controllers; j++) {
> -				if (!strcmp(controllers[j], "cpuset") || !strcmp(controllers[j], "memory")) {
> -					if (restore_special_props(paux, off2, e) < 0) {
> +				if (!strcmp(controllers[j], "cpuset") ||
> +				    !strcmp(controllers[j], "memory") ||
> +				    !strcmp(controllers[j], "devices")) {
> +					if (restore_special_props(paux, off2, e, i == 0) < 0) {
>  						pr_err("Restoring special cpuset props failed!\n");
>  						return -1;
>  					}
> -- 
> 2.7.4
> 


More information about the CRIU mailing list