[CRIU] [PATCH] Put a cap on the size of single preadv in restore operation.
Mike Rapoport
rppt at linux.ibm.com
Thu Feb 7 10:16:48 MSK 2019
On Wed, Feb 06, 2019 at 10:17:36PM -0800, Andrei Vagin wrote:
> On Tue, Feb 05, 2019 at 08:13:25PM +0100, Pawel Stradomski wrote:
> > When image files are stored on tmpfs, --auto-dedup can be used to use fallocate() to free
> > space used by image files after the data was copied to restored process.
> >
> > Temporarily (after preadv but before fallocate) the same data is present in both places,
> > increasing memory usage. By default preadv() would read up to 2GiB in one go which is a significant
> > overhead.
> >
> > This change caps the size of single read at 100MiB which is much more reasonable overhead.
>
> Maybe it is better to add an option to specify this limit?
+1
> >
> > Signed-off-by: Pawel Stradomski <pstradomski at google.com>
> > ---
> > criu/pie/restorer.c | 41 +++++++++++++++++++++++++++++++++++++++--
> > 1 file changed, 39 insertions(+), 2 deletions(-)
I think it would be better to build the ta->vma_ios so that each preadv
will use no more than the limit and add the limit to
'struct task_restore_args'
> >
> > diff --git a/criu/pie/restorer.c b/criu/pie/restorer.c
> > index d3b459c6..60859294 100644
> > --- a/criu/pie/restorer.c
> > +++ b/criu/pie/restorer.c
> > @@ -1219,6 +1219,37 @@ static bool vdso_needs_parking(struct task_restore_args *args)
> > return !vdso_unmapped(args);
> > }
> >
> > +/* Return number of elements that can be read from iovs in one preadv
> > + * without exceeding cap on read size. Possibly adjusts size of last element
> > + * to make it fit. In that case, the original size is saved to saved_last_iov_len.
> > + * Otherwise saved_last_iov_len is set to 0.
> > + *
> > + * We want to cap the size of one preadv because the code below does a preadv to
> > + * read data from image files (possibly on tmpfs) and then calls fallocate() to
> > + * free up space on that tmpfs. Thus temporarily the same data is on both tmpfs
> > + * and in process memory, adding memory overhead to the restore process.
> > + * */
> > +static int limit_iovec_size(struct iovec *iovs, int nr, size_t* saved_last_iov_len) {
> > + size_t remaining_read_limit = 100 * (1 << 20);
> > + int limited_nr = 0;
> > + for (int i = 0; i < nr; ++i) {
> > + if (iovs[i].iov_len > remaining_read_limit) {
> > + break;
> > + }
> > + remaining_read_limit -= iovs[i].iov_len;
> > + limited_nr++;
> > + }
> > +
> > + /* Try to do a partial read of the last iov. */
> > + *saved_last_iov_len = 0;
> > + if (limited_nr < nr && remaining_read_limit > 0) {
> > + *saved_last_iov_len = iovs[limited_nr].iov_len;
> > + iovs[limited_nr].iov_len = remaining_read_limit;
> > + limited_nr++;
> > + }
> > + return limited_nr;
> > +}
> > +
> > /*
> > * The main routine to restore task via sigreturn.
> > * This one is very special, we never return there
> > @@ -1389,10 +1420,16 @@ long __export_restore_task(struct task_restore_args *args)
> > ssize_t r;
> >
> > while (nr) {
> > + size_t saved_last_iov_len = 0;
> > + int nr_in_one_pread = limit_iovec_size(iovs, nr, &saved_last_iov_len);
> > pr_debug("Preadv %lx:%d... (%d iovs)\n",
> > (unsigned long)iovs->iov_base,
> > - (int)iovs->iov_len, nr);
> > - r = sys_preadv(args->vma_ios_fd, iovs, nr, rio->off);
> > + (int)iovs->iov_len, nr_in_one_pread);
> > + r = sys_preadv(args->vma_ios_fd, iovs, nr_in_one_pread, rio->off);
> > + /* Restore the iov_len we had overwritten */
> > + if (saved_last_iov_len > 0) {
> > + iovs[nr_in_one_pread-1].iov_len = saved_last_iov_len;
> > + }
> > if (r < 0) {
> > pr_err("Can't read pages data (%d)\n", (int)r);
> > goto core_restore_end;
> > --
> > 2.20.1.611.gfbb209baf1-goog
> >
> > _______________________________________________
> > CRIU mailing list
> > CRIU at openvz.org
> > https://lists.openvz.org/mailman/listinfo/criu
> _______________________________________________
> CRIU mailing list
> CRIU at openvz.org
> https://lists.openvz.org/mailman/listinfo/criu
>
--
Sincerely yours,
Mike.
More information about the CRIU
mailing list