[CRIU] [PATCH 1/5] bfd: File-descriptors based buffered read
Pavel Emelyanov
xemul at parallels.com
Mon Sep 22 00:56:02 PDT 2014
On 09/19/2014 07:39 PM, Christopher Covington wrote:
> On 09/19/2014 09:30 AM, Pavel Emelyanov wrote:
>> This sounds strange, but we kinda need one. Here's the
>> justification for that.
>>
>> We heavily open /proc/pid/foo files. To speed things up we
>> do pid_dir = open("/proc/pid") then openat(pid_dir, foo).
>> This really saves time on big trees, up to 10%.
>>
>> Sometimes we need line-by-line scan of these files, and for
>> that we currently use the fdopen() call. It takes a file
>> descriptor (obtained with openat from above) and wraps one
>> into a FILE*.
>>
>> The problem with the latter is that fdopen _always_ mmap()s
>> a buffer for reads and this buffer always (!) gets unmapped
>> back on fclose(). This pair of mmap() + munmap() eats time
>> on big trees, up to 10% in my experiments with p.haul tests.
>>
>> The situation is made even worse by the fact that each fgets
>> on the file results in a new page allocated in the kernel
>> (since the mapping is new). And also this fgets copies data,
>> which is not big deal, but for e.g. smaps file this results
>> in ~8K bytes being just copied around.
>>
>>
>> Having said that, here's a small but fast way of reading a
>> descriptor line-by-line using big buffer for reducing the
>> amount of read()s.
>>
>> After all per-task fopen_proc()-s get reworked on this engine
>> (next 4 patches) the results on p.haul test would be
>>
>> Syscall Calls Time (% of time)
>> Now:
>> mmap: 463 0.012033 (3.2%)
>> munmap: 447 0.014473 (3.9%)
>> Patched:
>> munmap: 57 0.002106 (0.6%)
>> mmap: 74 0.002286 (0.7%)
>>
>> The amount of read()s and open()s doesn't change since FILE*
>> also uses page-sized buffer for reading.
>>
>> Also this eliminates some amount of lseek()s and fstat()s
>> the fdopen() does every time to catch up with file position
>> and to determine what sort of buffering it should use (for
>> terminals it's \n-driven, for files it's not).
>>
>> Signed-off-by: Pavel Emelyanov <xemul at parallels.com>
>> ---
>> Makefile.crtools | 1 +
>> bfd.c | 156 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> include/bfd.h | 18 +++++++
>> 3 files changed, 175 insertions(+)
>> create mode 100644 bfd.c
>> create mode 100644 include/bfd.h
>>
>> diff --git a/Makefile.crtools b/Makefile.crtools
>> index 2a6d4d2..7f21d25 100644
>> --- a/Makefile.crtools
>> +++ b/Makefile.crtools
>> @@ -15,6 +15,7 @@ obj-y += cr-show.o
>> obj-y += cr-check.o
>> obj-y += cr-dedup.o
>> obj-y += util.o
>> +obj-y += bfd.o
>> obj-y += action-scripts.o
>> obj-y += sysctl.o
>> obj-y += ptrace.o
>> diff --git a/bfd.c b/bfd.c
>> new file mode 100644
>> index 0000000..2c1faae
>> --- /dev/null
>> +++ b/bfd.c
>> @@ -0,0 +1,156 @@
>> +#include <unistd.h>
>> +#include <stdbool.h>
>> +#include <string.h>
>> +#include <stdio.h>
>> +#include <sys/mman.h>
>> +#include <fcntl.h>
>> +
>> +#include "log.h"
>> +#include "bfd.h"
>> +
>> +/*
>> + * Kernel doesn't produce more than one page of
>> + * date per one read call on proc files.
>> + */
>> +#define BUFSIZE (4096)
>
> How about using PAGE_SIZE here?
>
> On AArch64, the normal page size may be 64KiB rather than 4KiB. (The
> architecture support 16KiB as well but it's not in Linux yet.) I've got some
> local hacks that I need to clean up but basically, I think for runtime usage
> of PAGE_SIZE, sysconf(_SC_PAGESIZE) should be used, and any remaining
> build-time usage of PAGE_SIZE can probably be replaced with EXEC_PAGESIZE from
> asm/param.h which seems to be maximum possible page size for the architecture.
Yup, nice suggestion. Will fix :)
Thanks,
Pavel
More information about the CRIU
mailing list