[Devel] [PATCH rh7] fs: make overlayfs disabled in CT by default

Vladimir Davydov vdavydov at virtuozzo.com
Thu Jul 7 04:10:51 PDT 2016


On Wed, Jul 06, 2016 at 10:33:07AM -0700, Maxim Patlasov wrote:
> On 07/06/2016 02:26 AM, Vladimir Davydov wrote:
> 
> >On Tue, Jul 05, 2016 at 04:45:10PM -0700, Maxim Patlasov wrote:
> >>Vova,
> >>
> >>
> >>On 07/04/2016 11:03 AM, Maxim Patlasov wrote:
> >>>On 07/04/2016 08:53 AM, Vladimir Davydov wrote:
> >>>
> >>>>On Tue, Jun 28, 2016 at 03:48:54PM -0700, Maxim Patlasov wrote:
> >>>>...
> >>>>>@@ -643,6 +643,7 @@ static struct cgroup_subsys_state
> >>>>>*ve_create(struct cgroup *cg)
> >>>>>        ve->odirect_enable = 2;
> >>>>>      ve->fsync_enable = 2;
> >>>>>+    ve->experimental_fs_enable = 2;
> >>>>For odirect_enable and fsync_enable, 2 means follow the host's config, 1
> >>>>means enable unconditionally, and 0 means disable unconditionally. But
> >>>>we don't want to allow a user inside a CT to enable this feature, right?
> >>>I thought it's OK to allow user inside CT to enable it if host sysadmin is
> >>>OK about it. The same logic as for odirect: by default
> >>>ve0->experimental_fs_enable = 0, so whatever user inside CT writes to this
> >>>knob, the feature is disabled. If sysadmin writes '1' to ve0->..., the
> >>>feature becomes enabled. If an user wants to voluntarily disable it inside
> >>>CT, that's OK too.
> >>>
> >>>>This is confusing. May be, we'd better add a new VE_FEATURE for the
> >>>>purpose?
> >>>Not sure right now. I'll look at it and let you know later.
> >>Technically, it is very easy to implement new VE_FEATURE for overlayfs. But
> >>this approach is less flexible because we return EPERM from ve_write_64 if
> >>CT is running, and we'll need to involve userspace team to make the feature
> >>configurable and (possibly) persistent. Do you think it's worthy for
> >>something we'll get rid of soon anyway (I mean as soon as PSBM-47981
> >>resolved)?
> >Fair enough, not much point in introducing yet another feature for the
> >purpose, at least right now, sysctl should do for the beginning.
> >
> >Come to think of it, do we really need this sysctl inside containers? I
> >mean, by enabling this sysctl on the host we open a possible system-wide
> >security hole, which a CT admin won't be able to mitigate by disabling
> >overlayfs inside her CT. So why would she need it for? To prevent
> >non-privileged CT users from mounting overlayfs inside a user ns? But
> >overlayfs is not permitted to be mounted by a userns root anyway AFAICS.
> >May be, just drop in-CT sysctl then?
> 
> Currently, anyone who can login into CT as root may mount overlayfs, then
> try to exploit its weak sides. This is a problem.
> 
> Until we ensure that overlayfs is production-ready (at least does not have
> obvious breaches), let's disable it by default (of course, if ve != ve0).
> Those who want to play with overlayfs at their own risk will enable it by
> turning on some knob on host system (ve == ve0).
> 
> I don't think that mixing trusted (overlayfs-enabled) CTs and not trusted
> (overlayfs-disabled) CTs on the same physical node is important use-case for
> now. So, any simple system-wide knob must work.

<nod>

> Essentially, the same scheme
> with odirect: by default it is '0' in ve0 and the root inside CT cannot turn
> it on; and if it is manually set to '1' in ve0, the behavior will depend on
> per-CT root willing.

No, that's not how it works. AFAICS (see may_use_odirect),

  ve0 sysctl	ve sysctl	odirect allowed in ve?
  x		0		0
  x		1		1
  x		2		x

i.e. system-wide sysctl can't be used to disallow odirect inside a VE,
while you want a different behavior AFAIU - you want to enable overlayfs
if both ve0 sysctl and ve sysctl are set. That's why the patch looks
confusing to me.

Let's only leave system-wide sysctl for permitting overlayfs. VE sysctl
doesn't make any sense - only root user is allowed to mount overlayfs
inside a CT and she can set this sysctl anyway.


More information about the Devel mailing list