[Devel] [vz7 v2 PATCH 2/2] devcg: Allow wildcard exceptions in DENY child cgroups

Pavel Tikhomirov ptikhomirov at virtuozzo.com
Fri Jan 20 13:51:33 MSK 2023


I don't like this behavior:

[root at ptikh-vz7 ~]# mkdir /sys/fs/cgroup/devices/test1
[root at ptikh-vz7 ~]# echo "a *:* rwmM" > 
/sys/fs/cgroup/devices/test1/devices.deny
[root at ptikh-vz7 ~]# cat /sys/fs/cgroup/devices/test1/devices.list
[root at ptikh-vz7 ~]# echo "c *:* rwm" > 
/sys/fs/cgroup/devices/test1/devices.allow
[root at ptikh-vz7 ~]# mkdir /sys/fs/cgroup/devices/test1/test2
[root at ptikh-vz7 ~]# echo "b *:* rwm" > 
/sys/fs/cgroup/devices/test1/test2/devices.allow
[root at ptikh-vz7 ~]# cat /sys/fs/cgroup/devices/test1/test2/devices.list
c *:* rwm
b *:* rwm
[root at ptikh-vz7 ~]# echo "c *:* rwm" > 
/sys/fs/cgroup/devices/test1/devices.deny
[root at ptikh-vz7 ~]# cat /sys/fs/cgroup/devices/test1/test2/devices.list
[root at ptikh-vz7 ~]#

When we remove any exception in ancestors "fake"-wildcard rules disappear.

I don't like this behavior even more:

[root at ptikh-vz7 ~]# mkdir /sys/fs/cgroup/devices/test3
[root at ptikh-vz7 ~]# echo "a *:* rwmM" > 
/sys/fs/cgroup/devices/test3/devices.deny
[root at ptikh-vz7 ~]# cat /sys/fs/cgroup/devices/test3/devices.list
[root at ptikh-vz7 ~]# echo "c *:* rwm" > 
/sys/fs/cgroup/devices/test3/devices.allow
[root at ptikh-vz7 ~]# mkdir /sys/fs/cgroup/devices/test3/test4
[root at ptikh-vz7 ~]# cat /sys/fs/cgroup/devices/test3/test4/devices.list
c *:* rwm
[root at ptikh-vz7 ~]# echo $$ > /sys/fs/cgroup/devices/test3/test4/tasks
[root at ptikh-vz7 ~]# mknod test c 1 3
mknod: ‘test’: Operation not permitted

Cgroup which is allowed to mknod all char devices can't mknod /dev/null.

I believe there would be more corner cases which are working similarly 
bad. I don't feel safe to change match_exception as this can lead to 
security issues. And properly reviewing such a change takes too much time.

>> If you have a node with kernel with this patch installed I would like to play with it, so that it would be easier to review.

Next time just provide node with kernel with patches applied, so that I 
don't need to build it myself, please.

On 19.01.2023 18:15, nb wrote:
> 
> 
> On 19.01.23 г. 17:02 ч., Pavel Tikhomirov wrote:
>> I believe this is not covering all cases, for instance it would break 
>> adding rules to second level nested cgroup, if second level nested 
>> cgroup is in "default deny" and it's parent is in "default deny" and 
>> none of them have CGRP_VE_ROOT set. In parent there is allowing 
>> wildcard rule and adding same allowing wildcard rule to child would 
>> fail as verify_new_ex -> match_exception would return false.
> 
> If we don't have CGRP_VE_ROOT then the code should remain as is 
> currently in upstream (see my comment about using ve_is_super below). 
> After all this is only a problem for our container setup.
> 
> 
> A  (default deny)
>   \
>    B (default deny) (allow wild card)
>     \
>      C <--- Add an allow wild card will be ignored and check will be 
> deferred to parent.
> 
> Can you describe visually what scenario you have in mind?
> 
> 
> 
> <snip>
> 
>>> @@ -445,10 +450,13 @@ static bool match_exception_partial(struct 
>>> list_head *exceptions, short type,
>>>    *
>>>    * returns: true in case it matches an exception completely
>>>    */
>>> -static bool match_exception(struct list_head *exceptions, short type,
>>> -                u32 major, u32 minor, short access)
>>> +static bool match_exception(struct dev_cgroup *dev_cgroup, short type,
>>> +                u32 major, u32 minor, short access, bool check_parent)
>>>   {
>>>       struct dev_exception_item *ex;
>>> +    struct cgroup *cgrp = dev_cgroup->css.cgroup;
>>> +    struct list_head *exceptions = &dev_cgroup->exceptions;
>>> +    bool wildcard_skipped = false;
>>>
>>>       list_for_each_entry_rcu(ex, exceptions, list) {
>>>           short mismatched_bits;
>>> @@ -462,6 +470,11 @@ static bool match_exception(struct list_head 
>>> *exceptions, short type,
>>>               continue;
>>>           if (ex->minor != ~0 && ex->minor != minor)
>>>               continue;
>>> +        /* skip wildcard rule if we are child cg */
>>> +        if (is_wildcard_exception(ex) && !test_bit(CGRP_VE_ROOT, 
>>> &cgrp->flags)) {
>>
>> Comment does not correspond to what you do, so either comment is wrong 
>> or what you do is wrong. Note: CGRP_VE_ROOT bit is only set on root 
>> container cgroups when container is running. O maybe I'm not getting 
>> "child cg" thing.
> 
> A child cg is really any cgroup under an CGRP_VE_ROOT group. What did 
> you think it was? Actually thinking more about it I think in this 
> condition I should also explicitly be checking if we are in a container 
> context via ve_is_super().
> 
>>
>>> +            wildcard_skipped = true;
>>> +            continue;
>>> +        }
>>>
>>>           /* provided access cannot have more than the exception rule */
>>>           mismatched_bits = access & (~ex->access) & ~ACC_MOUNT;
> 
> 
> <snip>

-- 
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.


More information about the Devel mailing list