[CRIU] [PATCH v4 2/4] vm: add a syscall to map a process memory into a pipe
Andrew Morton
akpm at linux-foundation.org
Tue Nov 28 02:42:49 MSK 2017
On Mon, 27 Nov 2017 09:19:39 +0200 Mike Rapoport <rppt at linux.vnet.ibm.com> wrote:
> From: Andrei Vagin <avagin at virtuozzo.com>
>
> It is a hybrid of process_vm_readv() and vmsplice().
>
> vmsplice can map memory from a current address space into a pipe.
> process_vm_readv can read memory of another process.
>
> A new system call can map memory of another process into a pipe.
>
> ssize_t process_vmsplice(pid_t pid, int fd, const struct iovec *iov,
> unsigned long nr_segs, unsigned int flags)
>
> All arguments are identical with vmsplice except pid which specifies a
> target process.
>
> Currently if we want to dump a process memory to a file or to a socket,
> we can use process_vm_readv() + write(), but it works slow, because data
> are copied into a temporary user-space buffer.
>
> A second way is to use vmsplice() + splice(). It is more effective,
> because data are not copied into a temporary buffer, but here is another
> problem. vmsplice works with the currect address space, so it can be
> used only if we inject our code into a target process.
>
> The second way suffers from a few other issues:
> * a process has to be stopped to run a parasite code
> * a number of pipes is limited, so it may be impossible to dump all
> memory in one iteration, and we have to stop process and inject our
> code a few times.
> * pages in pipes are unreclaimable, so it isn't good to hold a lot of
> memory in pipes.
>
> The introduced syscall allows to use a second way without injecting any
> code into a target process.
>
> My experiments shows that process_vmsplice() + splice() works two time
> faster than process_vm_readv() + write().
>
> It is particularly useful on a pre-dump stage. On this stage we enable a
> memory tracker, and then we are dumping a process memory while a
> process continues work. On the first iteration we are dumping all
> memory, and then we are dumpung only modified memory from a previous
> iteration. After a few pre-dump operations, a process is stopped and
> dumped finally. The pre-dump operations allow to significantly decrease
> a process downtime, when a process is migrated to another host.
What is the overall improvement in a typical dumping operation?
Does that improvement justify the addition of a new syscall, and all
that this entails? If so, why?
Are there any other applications of this syscall?
More information about the CRIU
mailing list