[Devel] [PATCH rh7] fs: make overlayfs disabled in CT by default

Maxim Patlasov mpatlasov at virtuozzo.com
Thu Jul 7 10:00:40 PDT 2016


On 07/07/2016 04:10 AM, Vladimir Davydov wrote:

> On Wed, Jul 06, 2016 at 10:33:07AM -0700, Maxim Patlasov wrote:
>> On 07/06/2016 02:26 AM, Vladimir Davydov wrote:
>>
>>> On Tue, Jul 05, 2016 at 04:45:10PM -0700, Maxim Patlasov wrote:
>>>> Vova,
>>>>
>>>>
>>>> On 07/04/2016 11:03 AM, Maxim Patlasov wrote:
>>>>> On 07/04/2016 08:53 AM, Vladimir Davydov wrote:
>>>>>
>>>>>> On Tue, Jun 28, 2016 at 03:48:54PM -0700, Maxim Patlasov wrote:
>>>>>> ...
>>>>>>> @@ -643,6 +643,7 @@ static struct cgroup_subsys_state
>>>>>>> *ve_create(struct cgroup *cg)
>>>>>>>         ve->odirect_enable = 2;
>>>>>>>       ve->fsync_enable = 2;
>>>>>>> +    ve->experimental_fs_enable = 2;
>>>>>> For odirect_enable and fsync_enable, 2 means follow the host's config, 1
>>>>>> means enable unconditionally, and 0 means disable unconditionally. But
>>>>>> we don't want to allow a user inside a CT to enable this feature, right?
>>>>> I thought it's OK to allow user inside CT to enable it if host sysadmin is
>>>>> OK about it. The same logic as for odirect: by default
>>>>> ve0->experimental_fs_enable = 0, so whatever user inside CT writes to this
>>>>> knob, the feature is disabled. If sysadmin writes '1' to ve0->..., the
>>>>> feature becomes enabled. If an user wants to voluntarily disable it inside
>>>>> CT, that's OK too.
>>>>>
>>>>>> This is confusing. May be, we'd better add a new VE_FEATURE for the
>>>>>> purpose?
>>>>> Not sure right now. I'll look at it and let you know later.
>>>> Technically, it is very easy to implement new VE_FEATURE for overlayfs. But
>>>> this approach is less flexible because we return EPERM from ve_write_64 if
>>>> CT is running, and we'll need to involve userspace team to make the feature
>>>> configurable and (possibly) persistent. Do you think it's worthy for
>>>> something we'll get rid of soon anyway (I mean as soon as PSBM-47981
>>>> resolved)?
>>> Fair enough, not much point in introducing yet another feature for the
>>> purpose, at least right now, sysctl should do for the beginning.
>>>
>>> Come to think of it, do we really need this sysctl inside containers? I
>>> mean, by enabling this sysctl on the host we open a possible system-wide
>>> security hole, which a CT admin won't be able to mitigate by disabling
>>> overlayfs inside her CT. So why would she need it for? To prevent
>>> non-privileged CT users from mounting overlayfs inside a user ns? But
>>> overlayfs is not permitted to be mounted by a userns root anyway AFAICS.
>>> May be, just drop in-CT sysctl then?
>> Currently, anyone who can login into CT as root may mount overlayfs, then
>> try to exploit its weak sides. This is a problem.
>>
>> Until we ensure that overlayfs is production-ready (at least does not have
>> obvious breaches), let's disable it by default (of course, if ve != ve0).
>> Those who want to play with overlayfs at their own risk will enable it by
>> turning on some knob on host system (ve == ve0).
>>
>> I don't think that mixing trusted (overlayfs-enabled) CTs and not trusted
>> (overlayfs-disabled) CTs on the same physical node is important use-case for
>> now. So, any simple system-wide knob must work.
> <nod>
>
>> Essentially, the same scheme
>> with odirect: by default it is '0' in ve0 and the root inside CT cannot turn
>> it on; and if it is manually set to '1' in ve0, the behavior will depend on
>> per-CT root willing.
> No, that's not how it works. AFAICS (see may_use_odirect),
>
>    ve0 sysctl	ve sysctl	odirect allowed in ve?
>    x		0		0
>    x		1		1
>    x		2		x
>
> i.e. system-wide sysctl can't be used to disallow odirect inside a VE,
> while you want a different behavior AFAIU - you want to enable overlayfs
> if both ve0 sysctl and ve sysctl are set. That's why the patch looks
> confusing to me.

Oh, yeah, it's my fault -- I didn't read may_use_odirect() carefully 
enough. Now I see it checks ve0 only if per-CT sysctl is '2'.

>
> Let's only leave system-wide sysctl for permitting overlayfs. VE sysctl
> doesn't make any sense - only root user is allowed to mount overlayfs
> inside a CT and she can set this sysctl anyway.

OK, I agree.


More information about the Devel mailing list