[CRIU] [PATCH 1/5] bfd: File-descriptors based buffered read

Pavel Emelyanov xemul at parallels.com
Mon Sep 22 00:56:02 PDT 2014


On 09/19/2014 07:39 PM, Christopher Covington wrote:
> On 09/19/2014 09:30 AM, Pavel Emelyanov wrote:
>> This sounds strange, but we kinda need one. Here's the
>> justification for that.
>>
>> We heavily open /proc/pid/foo files. To speed things up we
>> do pid_dir = open("/proc/pid") then openat(pid_dir, foo).
>> This really saves time on big trees, up to 10%.
>>
>> Sometimes we need line-by-line scan of these files, and for
>> that we currently use the fdopen() call. It takes a file
>> descriptor (obtained with openat from above) and wraps one
>> into a FILE*.
>>
>> The problem with the latter is that fdopen _always_ mmap()s
>> a buffer for reads and this buffer always (!) gets unmapped
>> back on fclose(). This pair of mmap() + munmap() eats time
>> on big trees, up to 10% in my experiments with p.haul tests.
>>
>> The situation is made even worse by the fact that each fgets
>> on the file results in a new page allocated in the kernel
>> (since the mapping is new). And also this fgets copies data,
>> which is not big deal, but for e.g. smaps file this results
>> in ~8K bytes being just copied around.
>>
>>
>> Having said that, here's a small but fast way of reading a
>> descriptor line-by-line using big buffer for reducing the
>> amount of read()s.
>>
>> After all per-task fopen_proc()-s get reworked on this engine
>> (next 4 patches) the results on p.haul test would be
>>
>>         Syscall     Calls      Time (% of time)
>> Now:
>>            mmap:      463  0.012033 (3.2%)
>>          munmap:      447  0.014473 (3.9%)
>> Patched:
>>          munmap:       57  0.002106 (0.6%)
>>            mmap:       74  0.002286 (0.7%)
>>
>> The amount of read()s and open()s doesn't change since FILE*
>> also uses page-sized buffer for reading.
>>
>> Also this eliminates some amount of lseek()s and fstat()s
>> the fdopen() does every time to catch up with file position
>> and to determine what sort of buffering it should use (for
>> terminals it's \n-driven, for files it's not).
>>
>> Signed-off-by: Pavel Emelyanov <xemul at parallels.com>
>> ---
>>  Makefile.crtools |   1 +
>>  bfd.c            | 156 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  include/bfd.h    |  18 +++++++
>>  3 files changed, 175 insertions(+)
>>  create mode 100644 bfd.c
>>  create mode 100644 include/bfd.h
>>
>> diff --git a/Makefile.crtools b/Makefile.crtools
>> index 2a6d4d2..7f21d25 100644
>> --- a/Makefile.crtools
>> +++ b/Makefile.crtools
>> @@ -15,6 +15,7 @@ obj-y	+= cr-show.o
>>  obj-y	+= cr-check.o
>>  obj-y	+= cr-dedup.o
>>  obj-y	+= util.o
>> +obj-y	+= bfd.o
>>  obj-y	+= action-scripts.o
>>  obj-y	+= sysctl.o
>>  obj-y	+= ptrace.o
>> diff --git a/bfd.c b/bfd.c
>> new file mode 100644
>> index 0000000..2c1faae
>> --- /dev/null
>> +++ b/bfd.c
>> @@ -0,0 +1,156 @@
>> +#include <unistd.h>
>> +#include <stdbool.h>
>> +#include <string.h>
>> +#include <stdio.h>
>> +#include <sys/mman.h>
>> +#include <fcntl.h>
>> +
>> +#include "log.h"
>> +#include "bfd.h"
>> +
>> +/*
>> + * Kernel doesn't produce more than one page of
>> + * date per one read call on proc files.
>> + */
>> +#define BUFSIZE	(4096)
> 
> How about using PAGE_SIZE here?
> 
> On AArch64, the normal page size may be 64KiB rather than 4KiB. (The
> architecture support 16KiB as well but it's not in Linux yet.) I've got some
> local hacks that I need to clean up but basically, I think for runtime usage
> of PAGE_SIZE, sysconf(_SC_PAGESIZE) should be used, and any remaining
> build-time usage of PAGE_SIZE can probably be replaced with EXEC_PAGESIZE from
> asm/param.h which seems to be maximum possible page size for the architecture.


Yup, nice suggestion. Will fix :)

Thanks,
Pavel



More information about the CRIU mailing list