[CRIU] [PATCH] Put a cap on the size of single preadv in restore operation.
Andrei Vagin
avagin at gmail.com
Thu Feb 7 09:17:36 MSK 2019
On Tue, Feb 05, 2019 at 08:13:25PM +0100, Pawel Stradomski wrote:
> When image files are stored on tmpfs, --auto-dedup can be used to use fallocate() to free
> space used by image files after the data was copied to restored process.
>
> Temporarily (after preadv but before fallocate) the same data is present in both places,
> increasing memory usage. By default preadv() would read up to 2GiB in one go which is a significant
> overhead.
>
> This change caps the size of single read at 100MiB which is much more reasonable overhead.
Maybe it is better to add an option to specify this limit?
>
> Signed-off-by: Pawel Stradomski <pstradomski at google.com>
> ---
> criu/pie/restorer.c | 41 +++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 39 insertions(+), 2 deletions(-)
>
> diff --git a/criu/pie/restorer.c b/criu/pie/restorer.c
> index d3b459c6..60859294 100644
> --- a/criu/pie/restorer.c
> +++ b/criu/pie/restorer.c
> @@ -1219,6 +1219,37 @@ static bool vdso_needs_parking(struct task_restore_args *args)
> return !vdso_unmapped(args);
> }
>
> +/* Return number of elements that can be read from iovs in one preadv
> + * without exceeding cap on read size. Possibly adjusts size of last element
> + * to make it fit. In that case, the original size is saved to saved_last_iov_len.
> + * Otherwise saved_last_iov_len is set to 0.
> + *
> + * We want to cap the size of one preadv because the code below does a preadv to
> + * read data from image files (possibly on tmpfs) and then calls fallocate() to
> + * free up space on that tmpfs. Thus temporarily the same data is on both tmpfs
> + * and in process memory, adding memory overhead to the restore process.
> + * */
> +static int limit_iovec_size(struct iovec *iovs, int nr, size_t* saved_last_iov_len) {
> + size_t remaining_read_limit = 100 * (1 << 20);
> + int limited_nr = 0;
> + for (int i = 0; i < nr; ++i) {
> + if (iovs[i].iov_len > remaining_read_limit) {
> + break;
> + }
> + remaining_read_limit -= iovs[i].iov_len;
> + limited_nr++;
> + }
> +
> + /* Try to do a partial read of the last iov. */
> + *saved_last_iov_len = 0;
> + if (limited_nr < nr && remaining_read_limit > 0) {
> + *saved_last_iov_len = iovs[limited_nr].iov_len;
> + iovs[limited_nr].iov_len = remaining_read_limit;
> + limited_nr++;
> + }
> + return limited_nr;
> +}
> +
> /*
> * The main routine to restore task via sigreturn.
> * This one is very special, we never return there
> @@ -1389,10 +1420,16 @@ long __export_restore_task(struct task_restore_args *args)
> ssize_t r;
>
> while (nr) {
> + size_t saved_last_iov_len = 0;
> + int nr_in_one_pread = limit_iovec_size(iovs, nr, &saved_last_iov_len);
> pr_debug("Preadv %lx:%d... (%d iovs)\n",
> (unsigned long)iovs->iov_base,
> - (int)iovs->iov_len, nr);
> - r = sys_preadv(args->vma_ios_fd, iovs, nr, rio->off);
> + (int)iovs->iov_len, nr_in_one_pread);
> + r = sys_preadv(args->vma_ios_fd, iovs, nr_in_one_pread, rio->off);
> + /* Restore the iov_len we had overwritten */
> + if (saved_last_iov_len > 0) {
> + iovs[nr_in_one_pread-1].iov_len = saved_last_iov_len;
> + }
> if (r < 0) {
> pr_err("Can't read pages data (%d)\n", (int)r);
> goto core_restore_end;
> --
> 2.20.1.611.gfbb209baf1-goog
>
> _______________________________________________
> CRIU mailing list
> CRIU at openvz.org
> https://lists.openvz.org/mailman/listinfo/criu
More information about the CRIU
mailing list