[CRIU] [PATCH 00/11] vDSO rework, part 3/3

Andrei Vagin avagin at virtuozzo.com
Fri Jul 21 17:33:10 MSK 2017


Applied, thanks
On Mon, Jul 17, 2017 at 03:39:51PM +0300, Dmitry Safonov wrote:
> Hi guys,
> 
> That's the last part of the set, it consists of proxyfication
> fixes: before those (2-4) patches on the creating of proxy-vdso
> the code unmapped original vvar image and did it iff vvar vma
> was placed after vdso in vmas list. As the result, after several
> C/R with inserted trampolines the virtual address space of the
> task has being polluted with rt-vvar vmas from previous restoree.
> 
> There is a test (7) for checking that after several C/Rs with
> inserted jump trampolines there is no pollution happening.
> 
> Besides, there is performance optimizations - by checking kernel
> APIs in kdat tests, we can omit some unnecessary deeds.
> 
> The only thing, I did not address in this set, but that may go
> wrong - is dropping of rt-vmas on the further C/Rs. Application
> may be on rt-vdso at the moment of dump and this seems racy.
> Nevertheless, it's not the patches set's regression - it's just
> still present race, which should be addressed by some fix afterward.
> 
> Attaching original set's cover:
> 
> === Full description of the set ===
> 
> After facing a bug with mismatched pfn for vdso in vz7 CTs
> and proposing a way to solve it and make vdso C/R faster:
> https://lists.openvz.org/pipermail/criu/2017-June/037982.html
> 
> I've started working on it. Then I've moved vdso symtable to
> criu.kdat file (as I previously have proposed to Pasha)...
> And found that vDSO code is not very readable and another
> fast-fix on the top may roll a yacht over 
> 
> During the refactoring I've also meet some bugs, which were fixed:
> 1. Leaving vdso/vvar after restore in task which didn't have them.
> 2. Bug with unmapping original vvar vma on the second C/R if
>    jump trampolines & rt-vdso is used.
> 3. Keeping rt-vvar after each C/R with inserted jump trampolines.
> 4. Bug with ia32 unmapping vdso after trampolines insertion.
> 
> For (2), (3), I did introduce rt-vdso mark v3.
> 
> There are two new tests:
> o Task with unmapped vdso (vdso02)
> o Iterative C/R with inserting jump-trampolines (vdso-proxy)
> 
> Then, the whole process of vdso's C/R has being reworked with
> criu.kdat file keeping in mind:
> 
> === Process of vDSO C/R ===
> 
> On *Dump*:
>         Before                                  After
>         ------                                  -----
> Checking vdso's pfn or filling          No need to do this on post v3.16
> symtable to find if VMa is vdso         kernels, as "[vdso]" mark stays
> or is mishinted by /proc/../maps        in maps file after mremap().
>         ------                                  -----
> Parsing of self-maps to find
> vdso/vvar position for remapping        No need to do this on dump.
> into restorer's parking zone.
>         ------                                  -----
> Parsing vdso's symtable to find         No need to do this on dump.
> if image's vdso matches host's vdso.
> 
> 
> On *Restore*:
>         Before                                  After
>         ------                                  -----
> Parsing of self-maps to find            Mapping vdso/vvar with
> vdso/vvar position for remapping        arch_prctl(MAP_VDSO_*)
> into restorer's parking zone.           to save sys_mremap() syscalls.
>         ------                                  -----
> Parsing vdso's symtable to find         Keeping vdso symtable in criu.kdat.
> if image's vdso matches host's vdso.
>         ------                                  -----
> Checking vdso's pfn or filling
> symtable to find if VMa is vdso         No need to do this on restore.
> or is mishinted by /proc/../maps
> 
> === Result: some numbers ===
> 
> I've tested the performance impact of the patches set on:
> 4.12.0-rc5+ kernel (needed arch_prctl() is available from v4.9 kernel)
> With the current criu-dev af6399cc5 ("vma: Fix badly inherited FD
> in filemap_open")
> And my `wip/new-vdso' branch on github.
> Done on Qemu with 2Gb of memory and 4 CPUs on fedora-26.
> Only values those differ are presented.
> 
> Mean test results on 30 iterations on busyloop00 (single-process impact)
> and session00 (~10 processes impact).
> Deviation is from session00 (before) test, as it's the largest there.
> 
> *Dump*                         | === busyloop00 === | === session00 === max dev
>                                | before:     after: | before:    after:
> syscalls:sys_enter_splice      |      13         12 |      96        96
> syscalls:sys_enter_fcntl       |      30         29 |     200       199
> syscalls:sys_enter_unlinkat    |      25         26 |      31        31
> syscalls:sys_enter_pipe        |       6          5 |      45        44
> syscalls:sys_enter_newfstatat  |     157        159 |     334       334
> syscalls:sys_enter_read        |     100         96 |     378       374
> syscalls:sys_enter_write       |      17         11 |      57        56
> syscalls:sys_enter_pread64     |      16         17 |     137        73
> syscalls:sys_enter_writev      |       2          0 |       4         4
> syscalls:sys_enter_sendfile64  |       3          0 |       0         0
> syscalls:sys_enter_open        |      85         83 |      83        83
> syscalls:sys_enter_openat      |      68         62 |     436       426
> syscalls:sys_enter_close       |     172        160 |     655       641
> syscalls:sys_enter_brk         |      10          4 |      14        14
> syscalls:sys_enter_munmap      |       5          2 |      12         9
> syscalls:sys_enter_kcmp        |       5          5 |     111       105 +-0.45%
> syscalls:sys_enter_getpid      |      25         24 |      88        87
> syscalls:sys_enter_kill        |       2          0 |       2         0
> syscalls:sys_enter_exit        |       1          0 |       1         0
> syscalls:sys_enter_wait4       |      19         17 |     138       136
> syscalls:sys_enter_mmap        |      26         25 |      33        32
> syscalls:sys_enter_arch_prctl  |       2          1 |       2         1
> seconds time elapsed           |0.085441   0.082569 |0.107197  0.103572 +-2.98%
> 
> *Restore*                      | === busyloop00 === | === session00 === max dev
>                                | before:     after: | before:    after:
> syscalls:sys_enter_dup2        |     245        241 |    1748      1615 +-6.45%
> syscalls:sys_enter_fcntl       |     627        617 |    4450      4119 +-6.34%
> syscalls:sys_enter_pipe        |       1          0 |       5         4
> syscalls:sys_enter_read        |      50         43 |     140       133
> syscalls:sys_enter_write       |      16         15 |      33        32
> syscalls:sys_enter_pread64     |       1          0 |     176       168
> syscalls:sys_enter_open        |     165        163 |     915       849 +-6.17%
> syscalls:sys_enter_openat      |      82         78 |     208       204
> syscalls:sys_enter_close       |     355        343 |    2074      1933 +-5.44%
> syscalls:sys_enter_mremap      |       8          6 |     319       303
> syscalls:sys_enter_munmap      |      14         11 |      70        67
> syscalls:sys_enter_futex       |      32         32 |     472       426 +-7.29%
> syscalls:sys_enter_getpid      |      36         33 |     133       130
> syscalls:sys_enter_rt_sigproc..|     266        262 |    1786      1654 +-6.32%
> syscalls:sys_enter_kill        |     122        118 |     878       810 +-6.43%
> syscalls:sys_enter_exit        |       1          0 |       1         0
> syscalls:sys_enter_wait4       |      22         20 |     392       350 +-7.47%
> syscalls:sys_enter_mmap        |      68         67 |      96        95
> syscalls:sys_enter_arch_prctl  |       6          6 |      20        27
> seconds time elapsed           |0.041698  0.0328033 |0.106096  0.097462 +-1.83%
> 
> Looking at the deviation, dumping takes about the same time, heh,
> but less number of syscalls anyway.
> 
> In time-values:
> Around 21% faster restoring on single busyloop!
> And 8% faster restore for ~10 processes.
> 
> Cc: Cyrill Gorcunov <gorcunov at openvz.org>
> 
> Dmitry Safonov (11):
>   kdat: Add test for presence of vdso mapping API
>   vdso: Introduce vdso mark v3
>   vdso: Don't drop original VVAR VMA on dump
>   vdso: Don't miss rt-vvar while searching
>   vdso: Split parasite_fixup_vdso() once more
>   vdso: Add a comment about rt-vdso and decreasing nr. of symbols
>   vdso/zdtm: Add iterative proxification test
>   vdso/restorer: Don't map compatible vdso if it was unmapped
>   vdso: Don't parse self-maps if kdat.can_map_vdso
>   vdso/kdat: Add test for preserving "[vdso]" hint after mremap()
>   vdso: Don't read pagemap or parse symtable under vdso_hint_reliable
> 
>  criu/arch/aarch64/include/asm/restorer.h |   2 +
>  criu/arch/arm/include/asm/restorer.h     |   2 +
>  criu/arch/ppc64/include/asm/restorer.h   |   2 +
>  criu/arch/x86/crtools.c                  |  65 ++++--
>  criu/arch/x86/include/asm/restorer.h     |   6 +
>  criu/arch/x86/restorer.c                 |  10 +
>  criu/cr-check.c                          |  10 +
>  criu/cr-restore.c                        |   1 +
>  criu/include/kerndat.h                   |   2 +
>  criu/include/parasite-vdso.h             |  71 +++---
>  criu/include/parasite.h                  |   5 +-
>  criu/include/restorer.h                  |   1 +
>  criu/include/vdso.h                      |   2 +
>  criu/kerndat.c                           |  14 +-
>  criu/pie/parasite-vdso.c                 |  48 ++--
>  criu/pie/parasite.c                      |  14 +-
>  criu/pie/restorer.c                      |  32 ++-
>  criu/vdso.c                              | 383 ++++++++++++++++++++-----------
>  test/jenkins/criu-fault.sh               |   1 +
>  test/zdtm/static/Makefile                |   1 +
>  test/zdtm/static/vdso-proxy.c            | 147 ++++++++++++
>  21 files changed, 586 insertions(+), 233 deletions(-)
>  create mode 100644 test/zdtm/static/vdso-proxy.c
> 
> -- 
> 2.13.1
> 
> _______________________________________________
> CRIU mailing list
> CRIU at openvz.org
> https://lists.openvz.org/mailman/listinfo/criu


More information about the CRIU mailing list