[CRIU] AUFS Support in CRIU
Saied Kazemi
saied at google.com
Tue Aug 19 23:21:54 PDT 2014
On Tue, Aug 19, 2014 at 3:03 AM, Pavel Emelyanov <xemul at parallels.com>
wrote:
> On 08/18/2014 08:43 PM, Saied Kazemi wrote:
>
> > > @@ -197,6 +199,20 @@ int fill_fdlink(int lfd, const struct
> fd_parms *p, struct fd_link *link)
> > > return -1;
> > > }
> > >
> > > + /*
> > > + * If the link file descriptor (lfd) points to a
> > > + * file in an AUFS branch, the pathname should be
> > > + * replaced with a pathname from root.
> > > + */
> > > + if (opts.aufs) {
> > > + int n = sizeof link->name - 1;
> > > + n = fixup_aufs_path(&link->name[1], n, true);
> >
> > Since you say, that we only see spoiled paths in map_files, do we
> need
> > the path fixup here? Isn't the one performed in parsing the mappings
> > not enough?
> >
> >
> > We need this logic here because the link->name is in an AUFS branch.
> For example, these are the paths that are passed to fill_fdlink() during my
> test:
> >
> > /dev/null
> > /var/lib/docker/aufs/diff/<ID>/bin/busybox
> > /var/lib/docker/aufs/diff/<ID>/lib/libuClibc-0.9.33.2.so <
> http://libuClibc-0.9.33.2.so>
> > /var/lib/docker/aufs/diff/<ID>/lib/ld64-uClibc-0.9.33.2.so <
> http://ld64-uClibc-0.9.33.2.so>
> > /
>
> OK, the only way why spoiled paths leak there is through the
> dump_filemap(),
> and the file we want to dump there was already seen in parse_smaps(). Can
> we
> make path fixup only once? I think we can
>
> * fixup the path in parse_smaps()
> * put one on the vma_area() (can be unione-ed with page_bitmap, it's
> restore field)
> * then in dump_filemap() put one as the link pointer on fdparms
>
>
> after this the if (!p->link) check in dump_one_reg_file() would avoid
> calling
> the fill_fdlink.
>
> What do you think?
>
It's a good idea and I reworked the code accordingly. Please see the
attached patch to make sure that it's how you meant.
>
> >
> >
> > > + if (n < 0)
> > > + return -1;
> > > + if (n > 0)
> > > + len = n;
> > > + }
> > > +
> > > link->len = len + 1;
> > > return 0;
> > > }
> >
> > > +int fixup_aufs_path(char *path, int size, bool chop)
> > > +{
> > > + char rpath[PATH_MAX];
> > > + int n;
> > > + int blen;
> > > +
> > > + if (aufs_branches == NULL) {
> > > + pr_err("No aufs branches to search for %s\n", path);
> > > + return -1;
> >
> > Presumably, if we don't have AUFS we shouldn't fail :)
> >
> >
> > If we don't have AUFS, we should not specify --aufs and, therefore, we
> should not be calling
> > fixup_aufs_path(). So, if we get to fixup_aufs_path() but don't have
> branch information, it's
> > an error.
> >
> > Alternatively, we can decide not to specify --aufs and call AUFS code
> regardless, returning
> > without error if there's no AUFS. But I added --aufs for performance
> reasons. It's used in
> > 3 places:
> >
> > 1. cr-dump.c/dump_one_task()
> > - to collect AUFS branch pathnames and free them when done
> > 2. files.c/fill_fdlink()
> > - to replace pathnames in branches with pathnames from root
> > 3. proc_parse.c/parse_smaps()
> > - to get the path for stat() instead of fstat()
> >
>
> Ah, I see. I was thinking that we can make CRIU automatically detect, that
> it
> should do paths fixups, but since we'd have to specify --aufs-root anyway,
> then OK.
>
When I have a bit more time, I will work on this and send a patch to
automatically detect AUFS.
> >
> > > + }
> > > +
> >
> > > +int parse_mountinfo_aufs_sbinfo(pid_t pid, char *sbinfo, int len)
> > > +{
> > > + char line[PATH_MAX];
> > > + char *fstype = NULL;
> > > + char *opt = NULL;
> > > + char *cp;
> > > + int n, ret;
> > > +
> > > + n = get_mntent_by_mountpoint(pid, "/", line, sizeof line);
> > > + if (n < 0)
> > > + return -1;
> > > +
> >
> > Presumably, this would get fixed by rework of mountinfo's parser,
> right?
> >
> >
> > I already reworked the callback in mountinfo's parser which was in the
> previous patch (see
> > proc_parse.c/aufs_parse()). This is different as it's not changing
> anything in the mountinfo
> > entry. It's just getting the sbinfo string from mountinfo root entry to
> collect AUFS branch
> > information.
> >
> > dump_one_task()
> > if (opts.aufs) parse_aufs_branches()
> > parse_mountinf_aufs_sbinfo()
> > get_mntent_by_mountpoint()
> >
> > It may be possible to add another call back, .post_parse, and rework
> mountinfo parsing. But
> > .post_parse will only be used by used by AUFS and this code is temporary
> anyway. It will go
> > away as soon as AUFS bug is fixed. So, if you're OK, let's keep it like
> this. For non-AUFS
> > filesystems, this code isn't even called.
>
> Yes, I agree. What bothers me is that we do mountinfo-s parsing multiple
> times and introduce two independent parser codes for that. I'd prefer if
> we make one-pass scanning.
>
I reworked this part too via the .parse callback so there are no additional
mountinfo parsings :)
>
> >
> >
> > > +int aufs_parse(struct mount_info *new)
> > > +{
> > > + char *cp;
> > > +
> > > + if (!opts.aufs_root || strcmp(new->mountpoint, "./"))
> >
> > If we've met AUFS, but no --aufs-root specified, shouldn't we fail
> the dump?
> >
> >
> > The reason it doesn't return an error is that as soon as the mountinfo
> entry for root is
> > fixed (i.e., shows the root pathname instead of just /), we won't have
> to specify --aufs-root
> > anymore and shouldn't fail in aufs_parse().
> >
> >
> >
> >
> > > + return 0;
> > > +
> > > + cp = malloc(strlen(opts.aufs_root) + 1);
> > > + if (!cp) {
> > > + pr_err("Cannot allocate memory for %s\n",
> opts.aufs_root);
> > > + return -1;
> > > + }
> > > + strcpy(cp, opts.aufs_root);
> > > +
> > > + pr_debug("Replacing %s with %s\n", new->root,
> opts.aufs_root);
> > > + free(new->root);
> > > + new->root = cp;
> >
> > What if we have more than one AUFS mountpoint with different roots?
> I'm
> > OK if we only support one, but then we should fail the 2nd
> mountpoint met.
> >
> >
> > Here, we're replacing the root of the container which by definition is
> only 1. But if different
> > processes in the process tree have different AUFS mounts, their
> respective branch pathnames will
> > be handed one by one in parse_aufs_branches() when called from
> dump_one_task().
>
> I meant -- what if one process sees two different AUFS mounts.
>
I think it should work because the branches are set for each nsid. This is
another thing that we can address as time allows going forward.
>
> > > +
> > > + return 0;
> > > +}
> >
> >
> > Thanks,
> > Pavel
> >
> >
> > Thanks again for your feedback. I hope the answers are clear. If it's
> easier or more productive,
> > I am available on video chat to discuss in greater detail. My schedule
> is flexible, so time
> > difference shouldn't be a problem.
>
> OK, I'll keep that in mind :) Your timezone is US Pacific one, right?
>
Yes, I am in Mountain View.
Looking forward to your comments on this patch which I think has addressed
everything above.
Cheers!
--Saied
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20140819/b487d033/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Added-AUFS-support.patch
Type: application/octet-stream
Size: 16573 bytes
Desc: not available
URL: <http://lists.openvz.org/pipermail/criu/attachments/20140819/b487d033/attachment-0001.obj>
More information about the CRIU
mailing list