[CRIU] AUFS Support in CRIU

Pavel Emelyanov xemul at parallels.com
Tue Aug 19 03:03:38 PDT 2014


On 08/18/2014 08:43 PM, Saied Kazemi wrote:

>     > @@ -197,6 +199,20 @@ int fill_fdlink(int lfd, const struct fd_parms *p, struct fd_link *link)
>     >               return -1;
>     >       }
>     >
>     > +     /*
>     > +      * If the link file descriptor (lfd) points to a
>     > +      * file in an AUFS branch, the pathname should be
>     > +      * replaced with a pathname from root.
>     > +      */
>     > +     if (opts.aufs) {
>     > +             int n = sizeof link->name - 1;
>     > +             n = fixup_aufs_path(&link->name[1], n, true);
> 
>     Since you say, that we only see spoiled paths in map_files, do we need
>     the path fixup here? Isn't the one performed in parsing the mappings
>     not enough?
> 
> 
> We need this logic here because the link->name is in an AUFS branch.  For example, these are the paths that are passed to fill_fdlink() during my test:
> 
> /dev/null
> /var/lib/docker/aufs/diff/<ID>/bin/busybox
> /var/lib/docker/aufs/diff/<ID>/lib/libuClibc-0.9.33.2.so <http://libuClibc-0.9.33.2.so>
> /var/lib/docker/aufs/diff/<ID>/lib/ld64-uClibc-0.9.33.2.so <http://ld64-uClibc-0.9.33.2.so>
> /

OK, the only way why spoiled paths leak there is through the dump_filemap(),
and the file we want to dump there was already seen in parse_smaps(). Can we
make path fixup only once? I think we can 

* fixup the path in parse_smaps()
* put one on the vma_area() (can be unione-ed with page_bitmap, it's restore field)
* then in dump_filemap() put one as the link pointer on fdparms


after this the if (!p->link) check in dump_one_reg_file() would avoid calling 
the fill_fdlink.

What do you think?

> 
> 
>     > +             if (n < 0)
>     > +                     return -1;
>     > +             if (n > 0)
>     > +                     len = n;
>     > +     }
>     > +
>     >       link->len = len + 1;
>     >       return 0;
>     >  }
> 
>     > +int fixup_aufs_path(char *path, int size, bool chop)
>     > +{
>     > +     char rpath[PATH_MAX];
>     > +     int n;
>     > +     int blen;
>     > +
>     > +     if (aufs_branches == NULL) {
>     > +             pr_err("No aufs branches to search for %s\n", path);
>     > +             return -1;
> 
>     Presumably, if we don't have AUFS we shouldn't fail :)
> 
> 
> If we don't have AUFS, we should not specify --aufs and, therefore, we should not be calling
> fixup_aufs_path().  So, if we get to fixup_aufs_path() but don't have branch information, it's 
> an error.
> 
> Alternatively, we can decide not to specify --aufs and call AUFS code regardless, returning 
> without error if there's no AUFS.  But I added --aufs for performance reasons.  It's used in 
> 3 places:
> 
> 1. cr-dump.c/dump_one_task()
>    - to collect AUFS branch pathnames and free them when done
> 2. files.c/fill_fdlink()
>    - to replace pathnames in branches with pathnames from root
> 3. proc_parse.c/parse_smaps()
>    - to get the path for stat() instead of fstat()
> 

Ah, I see. I was thinking that we can make CRIU automatically detect, that it
should do paths fixups, but since we'd have to specify --aufs-root anyway, then OK.

> 
>     > +     }
>     > +
> 
>     > +int parse_mountinfo_aufs_sbinfo(pid_t pid, char *sbinfo, int len)
>     > +{
>     > +     char line[PATH_MAX];
>     > +     char *fstype = NULL;
>     > +     char *opt = NULL;
>     > +     char *cp;
>     > +     int n, ret;
>     > +
>     > +     n = get_mntent_by_mountpoint(pid, "/", line, sizeof line);
>     > +     if (n < 0)
>     > +             return -1;
>     > +
> 
>     Presumably, this would get fixed by rework of mountinfo's parser, right?
> 
> 
> I already reworked the callback in mountinfo's parser which was in the previous patch (see 
> proc_parse.c/aufs_parse()).  This is different as it's not changing anything in the mountinfo
> entry.  It's just getting the sbinfo string from mountinfo root entry to collect AUFS branch
> information.  
> 
> dump_one_task()
>    if (opts.aufs) parse_aufs_branches()
>       parse_mountinf_aufs_sbinfo()
>          get_mntent_by_mountpoint()
> 
> It may be possible to add another call back, .post_parse, and rework mountinfo parsing.  But 
> .post_parse will only be used by used by AUFS and this code is temporary anyway.  It will go
> away as soon as AUFS bug is fixed.  So, if you're OK, let's keep it like this.  For non-AUFS
> filesystems, this code isn't even called.

Yes, I agree. What bothers me is that we do mountinfo-s parsing multiple
times and introduce two independent parser codes for that. I'd prefer if
we make one-pass scanning.

> 
> 
>     > +int aufs_parse(struct mount_info *new)
>     > +{
>     > +     char *cp;
>     > +
>     > +     if (!opts.aufs_root || strcmp(new->mountpoint, "./"))
> 
>     If we've met AUFS, but no --aufs-root specified, shouldn't we fail the dump?
> 
> 
> The reason it doesn't return an error is that as soon as the mountinfo entry for root is
> fixed (i.e., shows the root pathname instead of just /), we won't have to specify --aufs-root 
> anymore and shouldn't fail in aufs_parse().
> 
>  
> 
> 
>     > +             return 0;
>     > +
>     > +     cp = malloc(strlen(opts.aufs_root) + 1);
>     > +     if (!cp) {
>     > +             pr_err("Cannot allocate memory for %s\n", opts.aufs_root);
>     > +             return -1;
>     > +     }
>     > +     strcpy(cp, opts.aufs_root);
>     > +
>     > +     pr_debug("Replacing %s with %s\n", new->root, opts.aufs_root);
>     > +     free(new->root);
>     > +     new->root = cp;
> 
>     What if we have more than one AUFS mountpoint with different roots? I'm
>     OK if we only support one, but then we should fail the 2nd mountpoint met.
> 
> 
> Here, we're replacing the root of the container which by definition is only 1.  But if different
> processes in the process tree have different AUFS mounts, their respective branch pathnames will
> be handed one by one in parse_aufs_branches() when called from dump_one_task().

I meant -- what if one process sees two different AUFS mounts.

>     > +
>     > +     return 0;
>     > +}
> 
> 
>     Thanks,
>     Pavel
> 
> 
> Thanks again for your feedback.  I hope the answers are clear.  If it's easier or more productive,
> I am available on video chat to discuss in greater detail.  My schedule is flexible, so time 
> difference shouldn't be a problem.

OK, I'll keep that in mind :) Your timezone is US Pacific one, right?

Thanks,
Pavel



More information about the CRIU mailing list