[Devel] [vz7 v2 PATCH 2/2] devcg: Allow wildcard exceptions in DENY child cgroups
Pavel Tikhomirov
ptikhomirov at virtuozzo.com
Fri Jan 20 13:51:33 MSK 2023
I don't like this behavior:
[root at ptikh-vz7 ~]# mkdir /sys/fs/cgroup/devices/test1
[root at ptikh-vz7 ~]# echo "a *:* rwmM" >
/sys/fs/cgroup/devices/test1/devices.deny
[root at ptikh-vz7 ~]# cat /sys/fs/cgroup/devices/test1/devices.list
[root at ptikh-vz7 ~]# echo "c *:* rwm" >
/sys/fs/cgroup/devices/test1/devices.allow
[root at ptikh-vz7 ~]# mkdir /sys/fs/cgroup/devices/test1/test2
[root at ptikh-vz7 ~]# echo "b *:* rwm" >
/sys/fs/cgroup/devices/test1/test2/devices.allow
[root at ptikh-vz7 ~]# cat /sys/fs/cgroup/devices/test1/test2/devices.list
c *:* rwm
b *:* rwm
[root at ptikh-vz7 ~]# echo "c *:* rwm" >
/sys/fs/cgroup/devices/test1/devices.deny
[root at ptikh-vz7 ~]# cat /sys/fs/cgroup/devices/test1/test2/devices.list
[root at ptikh-vz7 ~]#
When we remove any exception in ancestors "fake"-wildcard rules disappear.
I don't like this behavior even more:
[root at ptikh-vz7 ~]# mkdir /sys/fs/cgroup/devices/test3
[root at ptikh-vz7 ~]# echo "a *:* rwmM" >
/sys/fs/cgroup/devices/test3/devices.deny
[root at ptikh-vz7 ~]# cat /sys/fs/cgroup/devices/test3/devices.list
[root at ptikh-vz7 ~]# echo "c *:* rwm" >
/sys/fs/cgroup/devices/test3/devices.allow
[root at ptikh-vz7 ~]# mkdir /sys/fs/cgroup/devices/test3/test4
[root at ptikh-vz7 ~]# cat /sys/fs/cgroup/devices/test3/test4/devices.list
c *:* rwm
[root at ptikh-vz7 ~]# echo $$ > /sys/fs/cgroup/devices/test3/test4/tasks
[root at ptikh-vz7 ~]# mknod test c 1 3
mknod: ‘test’: Operation not permitted
Cgroup which is allowed to mknod all char devices can't mknod /dev/null.
I believe there would be more corner cases which are working similarly
bad. I don't feel safe to change match_exception as this can lead to
security issues. And properly reviewing such a change takes too much time.
>> If you have a node with kernel with this patch installed I would like to play with it, so that it would be easier to review.
Next time just provide node with kernel with patches applied, so that I
don't need to build it myself, please.
On 19.01.2023 18:15, nb wrote:
>
>
> On 19.01.23 г. 17:02 ч., Pavel Tikhomirov wrote:
>> I believe this is not covering all cases, for instance it would break
>> adding rules to second level nested cgroup, if second level nested
>> cgroup is in "default deny" and it's parent is in "default deny" and
>> none of them have CGRP_VE_ROOT set. In parent there is allowing
>> wildcard rule and adding same allowing wildcard rule to child would
>> fail as verify_new_ex -> match_exception would return false.
>
> If we don't have CGRP_VE_ROOT then the code should remain as is
> currently in upstream (see my comment about using ve_is_super below).
> After all this is only a problem for our container setup.
>
>
> A (default deny)
> \
> B (default deny) (allow wild card)
> \
> C <--- Add an allow wild card will be ignored and check will be
> deferred to parent.
>
> Can you describe visually what scenario you have in mind?
>
>
>
> <snip>
>
>>> @@ -445,10 +450,13 @@ static bool match_exception_partial(struct
>>> list_head *exceptions, short type,
>>> *
>>> * returns: true in case it matches an exception completely
>>> */
>>> -static bool match_exception(struct list_head *exceptions, short type,
>>> - u32 major, u32 minor, short access)
>>> +static bool match_exception(struct dev_cgroup *dev_cgroup, short type,
>>> + u32 major, u32 minor, short access, bool check_parent)
>>> {
>>> struct dev_exception_item *ex;
>>> + struct cgroup *cgrp = dev_cgroup->css.cgroup;
>>> + struct list_head *exceptions = &dev_cgroup->exceptions;
>>> + bool wildcard_skipped = false;
>>>
>>> list_for_each_entry_rcu(ex, exceptions, list) {
>>> short mismatched_bits;
>>> @@ -462,6 +470,11 @@ static bool match_exception(struct list_head
>>> *exceptions, short type,
>>> continue;
>>> if (ex->minor != ~0 && ex->minor != minor)
>>> continue;
>>> + /* skip wildcard rule if we are child cg */
>>> + if (is_wildcard_exception(ex) && !test_bit(CGRP_VE_ROOT,
>>> &cgrp->flags)) {
>>
>> Comment does not correspond to what you do, so either comment is wrong
>> or what you do is wrong. Note: CGRP_VE_ROOT bit is only set on root
>> container cgroups when container is running. O maybe I'm not getting
>> "child cg" thing.
>
> A child cg is really any cgroup under an CGRP_VE_ROOT group. What did
> you think it was? Actually thinking more about it I think in this
> condition I should also explicitly be checking if we are in a container
> context via ve_is_super().
>
>>
>>> + wildcard_skipped = true;
>>> + continue;
>>> + }
>>>
>>> /* provided access cannot have more than the exception rule */
>>> mismatched_bits = access & (~ex->access) & ~ACC_MOUNT;
>
>
> <snip>
--
Best regards, Tikhomirov Pavel
Software Developer, Virtuozzo.
More information about the Devel
mailing list