<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Oct 3, 2013 at 3:44 AM, Eric W. Biederman <span dir="ltr"><<a href="mailto:ebiederm@xmission.com" target="_blank">ebiederm@xmission.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div>Amir Goldstein <<a href="mailto:amir@cellrox.com" target="_blank">amir@cellrox.com</a>> writes:<br>
<br>
> What we really like to see is a setns() style API that can be used to<br>
> add a device in the context of a namespace in either a "shared" or<br>
> "private" mode.<br>
<br>
</div>I think you mean an "ip link set dev FOO netns XXX" style API.<br></blockquote><div><br></div><div>correct.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
Right now one of the best suggestions on the table is:<br>
<br>
mkdir -p /dev/container/X<br>
ln /dev/zero /dev/container/X/zero<br>
ln /dev/null /dev/container/X/null<br>
...<br>
<br>
With /dev/container/X mounted on /dev for container X.<br>
<br>
Which seems to cover putting a device in a namespace, while allowing<br>
things to still be reasonably managed.<br>
<br>
There are a few other variations on that scheme but nothing that says we<br>
must have kernel support or to create any kind of kernel context beyond<br>
which directory the device nodes live in.<br>
<div><br>
> This kind of API is a required building block for us to write device<br>
> drivers that are namespace aware in a way that userspace will have<br>
> enough flexibility for dynamic configuration.<br>
><br>
> We are trying to come up with a proposal for that sort of API. When<br>
> we have something decent, we shall post it.<br>
<br>
</div>I really think what you need to write are special drivers that<br>
facilitate your use case.<br>
<br>
For the networking stack we wound up adding veth pairs, and macvlan<br>
devices, to handle the common sharing modes.<br>
<br>
Outside of your sharing situation I am not seeing any need or any<br>
advantage of creating devices that are modified to be sharable and I am<br>
seeing a lot of disadvantages to implementing things that way. The<br>
biggest is that you seem to working independent of the subsystem<br>
maintainers of those devices which is generally a poor idea.<br>
<br>
Unprivileged creation of device nodes we can handle if it can be shown<br>
that it is safe to create device nodes.<br>
<br>
As I understand your problem you are trying to multiplex a device by<br>
building a device with a built in stop light. Where one opener can<br>
write and the other openers are stopped/dropped. That sounds very<br>
similar to macvlan, or ethernet bridging. From the patches you have<br>
floated I suspect it would be very simple to build and just need a<br>
little bit of glue.<br></blockquote><div><br></div><div>Excellent! let's focus the discussion on a new device driver we want to write</div><div>which is namespace aware. let's call this device driver valarm-dev.</div>
<div>Similarly to Android's alarm-dev, valarm-dev can be used to request RTC wakeup calls</div><div>from user space and get/set RTC values, but with valarm-dev, every container</div><div>may use different values for current time.</div>
<div><br></div><div>As you can see in our patch set, we already have a version of alarm-dev that maintains</div><div>its state inside a context, instead of in global variable, so it is capable of providing</div><div>different context per namespace.</div>
<div><br></div><div>And now for the 1M$ question: per *which* namespace do we attribute the current realtime clock time?</div><div>To UTS namespace (because T historically stands for Time)? To device namespace?</div><div>
Even if device namespace would exist, we do not want to tie the policy decision of "separate time"</div><div>to a very wide definition of "separate devices".</div><div><br></div><div>So what we want to create, is an API for device driver writers, that will enable to write a namespace</div>
<div>aware device and allow userspace to configure when the namespace aware device context is unshared.</div><div><br></div><div>We would like to share with you our very initial thoughts about how this will be implemented:</div>
<div>- Extend register_pernet_subsys/device(ops) API to register_perns_subsys/device(nstype, ops) API</div><div>- Extend pernet_operations to perns_operations that include optional migrate() and/or unshare() ops</div><div>
- Let valarm-dev register_peruser_subsys/device(&alarm_userns_ops)</div><div>- Implement a new syscall (or netlink command if it makes more sense) setdevns(int dev_fd, int ns_fd, int nstype, int flags)</div>
<div>- Unlike the netlink set netns case, this API is not used solely to *move* a device to a different namespace,</div><div> but also to *unshare* a device context between namespaces, for those devices that resigtered unshare() ops.</div>
<div><br></div><div>This is our missing piece of the puzzle.</div><div>After that, whether we make changes to existing drivers (e.g. evdev) or write new virtualized drivers (e.g. vevdev)</div><div>is a technicality. We care not which way to go, whichever way seems more maintainable.</div>
<div><br></div><div>What do you think of this master plan?</div><div><br></div><div>P.S. Please try to refrain from addressing the validity of the use case of alarm-dev in particular,</div><div>as we do not wish to get engage "Android sucks" wars. </div>
<div>We simply want to present the case for improving the namespace infrastructure to cater the needs</div><div>of device driver writers that wish to tailor their drivers for containers based products. </div><div><br></div>
<div>Cheers,</div><div>Amir.</div><div> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<span><font color="#888888"><br>
Eric<br>
</font></span></blockquote></div><br></div></div>