[CRIU] [PATCH 1/5] bfd: File-descriptors based buffered read

Christopher Covington cov at codeaurora.org
Fri Sep 19 08:39:58 PDT 2014


On 09/19/2014 09:30 AM, Pavel Emelyanov wrote:
> This sounds strange, but we kinda need one. Here's the
> justification for that.
> 
> We heavily open /proc/pid/foo files. To speed things up we
> do pid_dir = open("/proc/pid") then openat(pid_dir, foo).
> This really saves time on big trees, up to 10%.
> 
> Sometimes we need line-by-line scan of these files, and for
> that we currently use the fdopen() call. It takes a file
> descriptor (obtained with openat from above) and wraps one
> into a FILE*.
> 
> The problem with the latter is that fdopen _always_ mmap()s
> a buffer for reads and this buffer always (!) gets unmapped
> back on fclose(). This pair of mmap() + munmap() eats time
> on big trees, up to 10% in my experiments with p.haul tests.
> 
> The situation is made even worse by the fact that each fgets
> on the file results in a new page allocated in the kernel
> (since the mapping is new). And also this fgets copies data,
> which is not big deal, but for e.g. smaps file this results
> in ~8K bytes being just copied around.
> 
> 
> Having said that, here's a small but fast way of reading a
> descriptor line-by-line using big buffer for reducing the
> amount of read()s.
> 
> After all per-task fopen_proc()-s get reworked on this engine
> (next 4 patches) the results on p.haul test would be
> 
>         Syscall     Calls      Time (% of time)
> Now:
>            mmap:      463  0.012033 (3.2%)
>          munmap:      447  0.014473 (3.9%)
> Patched:
>          munmap:       57  0.002106 (0.6%)
>            mmap:       74  0.002286 (0.7%)
> 
> The amount of read()s and open()s doesn't change since FILE*
> also uses page-sized buffer for reading.
> 
> Also this eliminates some amount of lseek()s and fstat()s
> the fdopen() does every time to catch up with file position
> and to determine what sort of buffering it should use (for
> terminals it's \n-driven, for files it's not).
> 
> Signed-off-by: Pavel Emelyanov <xemul at parallels.com>
> ---
>  Makefile.crtools |   1 +
>  bfd.c            | 156 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/bfd.h    |  18 +++++++
>  3 files changed, 175 insertions(+)
>  create mode 100644 bfd.c
>  create mode 100644 include/bfd.h
> 
> diff --git a/Makefile.crtools b/Makefile.crtools
> index 2a6d4d2..7f21d25 100644
> --- a/Makefile.crtools
> +++ b/Makefile.crtools
> @@ -15,6 +15,7 @@ obj-y	+= cr-show.o
>  obj-y	+= cr-check.o
>  obj-y	+= cr-dedup.o
>  obj-y	+= util.o
> +obj-y	+= bfd.o
>  obj-y	+= action-scripts.o
>  obj-y	+= sysctl.o
>  obj-y	+= ptrace.o
> diff --git a/bfd.c b/bfd.c
> new file mode 100644
> index 0000000..2c1faae
> --- /dev/null
> +++ b/bfd.c
> @@ -0,0 +1,156 @@
> +#include <unistd.h>
> +#include <stdbool.h>
> +#include <string.h>
> +#include <stdio.h>
> +#include <sys/mman.h>
> +#include <fcntl.h>
> +
> +#include "log.h"
> +#include "bfd.h"
> +
> +/*
> + * Kernel doesn't produce more than one page of
> + * date per one read call on proc files.
> + */
> +#define BUFSIZE	(4096)

How about using PAGE_SIZE here?

On AArch64, the normal page size may be 64KiB rather than 4KiB. (The
architecture support 16KiB as well but it's not in Linux yet.) I've got some
local hacks that I need to clean up but basically, I think for runtime usage
of PAGE_SIZE, sysconf(_SC_PAGESIZE) should be used, and any remaining
build-time usage of PAGE_SIZE can probably be replaced with EXEC_PAGESIZE from
asm/param.h which seems to be maximum possible page size for the architecture.

Thanks,
Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.


More information about the CRIU mailing list