<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Aug 11, 2014 at 8:50 AM, Andrew Vagin <span dir="ltr"><<a href="mailto:avagin@parallels.com" target="_blank">avagin@parallels.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><div>On Mon, Aug 11, 2014 at 05:24:05PM +0400, Pavel Emelyanov wrote:<br>
> On 08/11/2014 05:17 PM, Andrew Vagin wrote:<br>
> > Hi Sophie,<br>
> ><br>
> > On Fri, Aug 08, 2014 at 10:21:19PM -0700, Sophie Blee-Goldman wrote:<br>
> >> Adds basic support for user namespaces by dumping and restoring<br>
> >> the namespace itself and the uid/gid maps of the root process.<br>
> ><br>
> > How do you test your patches?<br>
><br>
> I have the same question. It's OK if the initial version of userns<br>
> only supports some limited stuff, but we should know what it is :)<br>
><br>
> > ZDTM test suite can execute tests in<br>
> > namespaces, but the current version knows nothing about userns. Have you<br>
> > try to add userns in ZDTM lib?<br>
> ><br></div></div></blockquote><div><br></div><div>I wrote a short script for testing that starts several processes<br></div><div>(with different sets of credentials) in a namespace jail, then dumps and restores</div>
<div>the process tree.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><div>
> >><br>
> >> Currently depends on a kernel patch to avoid failing on the prctl<br>
> >> syscall by checking for CAP_SYS_RESOURCE in the user namespace<br>
> >> instead of in the global one.<br>
> ><br>
> > It isn't so simple.<br>
> > Kirill is trying to fix this issue: <a href="https://lkml.org/lkml/2014/8/4/570" target="_blank">https://lkml.org/lkml/2014/8/4/570</a><br>
> ><br>
> > We have a number of other kernel issues, which are described here:<br>
> > <a href="http://criu.org/UserNamespace" target="_blank">http://criu.org/UserNamespace</a><br>
> ><br>
> > Have you seen my patches for userns?<br>
> > <a href="http://lists.openvz.org/pipermail/criu/2014-February/012399.html" target="_blank">http://lists.openvz.org/pipermail/criu/2014-February/012399.html</a><br>
> ><br>
> > and here is updated version:<br>
> > <a href="https://github.com/avagin/criu/tree/userns2" target="_blank">https://github.com/avagin/criu/tree/userns2</a><br>
> ><br>
> > I suggest to find the difference between our patch sets and make a new one,<br>
> > which will contain best things from both ones.<br>
><br>
> Andrey, can you suggest which things are best in both sets? :)<br>
<br>
</div></div>For example what I see now. My version:<br>
* doesn't have comments, which I enumerated for this patch<br>
* handles ghost uids<br>
* fixes zdtm lib to test user namespaces<br>
* prepares the root mount for pivot_root<br>
* set PR_SET_DUMPABLE to have access to proc files<br>
* extends ZDTM lib to execute tests in userns<br>
<br>
Sophie's version:<br>
* dumps capability from parasite. I don't know which capabilities are<br>
shown in /proc/PID/status. </blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">* ... </blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
Sophie, could you continue this list or comment my list.<br></blockquote><div><br></div><div>I think you've listed all the major points. The thing we seem to handle</div><div>in a significantly different way is getting the ids inside the namespace.</div>
<div>I think there's something to be said for dumping those in the parasite</div><div>alongside the capabilities, which would avoid the need to store the</div><div>maps as a global variable (Also, it's possible I overlooked something but</div>
<div>I think you may be leaking the memory malloc'ed for these maps).</div><div><br></div><div>That said, the simplicity of just calling getresuid and getresgid from the</div><div>parasite doesn't extend to getting the ghost files' ids, since it looks like</div>
<div>a lot more would have to be moved to the parasite. So maybe it's best to</div><div>use your method and just translate the ids. On that note, one thing I noticed</div><div>in your code is that during a dump you translate the ids before calling</div>
<div>may_dump(), which could be problematic if criu is ever to support dumping</div><div>user namespaces by non-root users. A similar issue arises for may_restore().</div></div><br></div><div class="gmail_extra"><br></div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<span style="font-family:arial,sans-serif;font-size:12.727272033691406px">{u,g}id_map files can be written only once. If you use fprintf, you<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">don't know how many times write will be called.</span><br style="font-family:arial,sans-serif;font-size:12.727272033691406px">
<span style="font-family:arial,sans-serif;font-size:12.727272033691406px">kernel/user_namespace.c:<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">static ssize_t map_write(struct file *file, const char __user *buf,<br>
</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px">...<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"> ret = -EPERM;<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"> /* Only allow one successful write to the map */<br>
</span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"> if (map->nr_extents != 0)<br></span><span style="font-family:arial,sans-serif;font-size:12.727272033691406px"> goto out;</span><br style="font-family:arial,sans-serif;font-size:12.727272033691406px">
<span style="font-family:arial,sans-serif;font-size:12.727272033691406px">...</span></blockquote><div><div style="font-family:arial,sans-serif;font-size:12.727272033691406px">My mistake, I had thought the restriction was that the id_maps could only</div>
<div style="font-family:arial,sans-serif;font-size:12.727272033691406px">be opened for writing once.</div></div><div style="font-family:arial,sans-serif;font-size:12.727272033691406px"><br></div><blockquote style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex" class="gmail_quote">
> +int dump_user_ns(int ns_pid, int ns_id)<br>> +{<br>> + int fd, ret, n_uid_entries, n_gid_entries;<br>> + char map_fname[PATH_MAX];<br>> +<br>> + LIST_HEAD(uid_list);<br>Why do we need this list? Can we write uid_maps in an image without<br>
collection in the list?</blockquote><div style="font-family:arial,sans-serif;font-size:12.727272033691406px"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div><br></div></blockquote><div><br></div><div>I felt that reading the map entries into a list first was cleaner than reading</div><div>them directly into the array and resizing it while parsing the uid/gid maps. But</div>
<div>either way of doing it seems reasonable, so I wouldn't say we *need* this list.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div>> + /* read uid map */<br>> + sprintf(map_fname, "/proc/%d/uid_map", ns_pid);<br><br></div>can we use fopen_proc here?<br></blockquote><div><br></div><div>I hadn't noticed the 'open_proc' set of functions, but yes that would be better.</div>
<div><br></div><div>Thanks,</div><div>Sophie</div>
</div></div>