[CRIU] [PATCH 3/4] arm64: provide own memcpy for pie

Dmitry Safonov dsafonov at virtuozzo.com
Thu Nov 10 04:11:59 PST 2016


On 11/10/2016 02:36 AM, Kir Kolyshkin wrote:
> On 11/08/2016 03:35 PM, Kir Kolyshkin wrote:
>>
>>
>> On 11/08/2016 11:18 AM, Dmitry Safonov wrote:
>>> On 11/08/2016 09:51 PM, Kir Kolyshkin wrote:
>>>>
>>>>
>>>> On 11/08/2016 04:09 AM, Dmitry Safonov wrote:
>>>>> On 11/08/2016 12:23 AM, Kir Kolyshkin wrote:
>>>>>> Chris, Dmitry,
>>>>>>
>>>>>> Can you please take a look at this (and, ideally, test on amd64 with
>>>>>> both clang and gcc)?
>>>>>
>>>>> Well, looks ok to me, the small nip is that we can omit if(n) check
>>>>> here, if I read asm rightly:
>>>>>
>>>>>>> +    if (n)
>>>>>>> +        memcpy_arm64(to, from, n);
>>>>>>> +    return to;
>>>>
>>>> This part I just copied from ppc64 implementation by Laurent.
>>>> I guess the reason is to save on a function call if n==0. If this
>>>> doesn't make sense, let me know and I'll drop it.
>>>
>>> Ok, keep it.
>>>
>>>>> I don't have arm64 board, but have a qemu VM.
>>>>> Anyway, it's quite big pile of code and if we can't see gain - and
>>>>> don't have arm64 boards to prove it's perf impact (I don't think it
>>>>> will be visible on qemu), what the fuss about?
>>>>> Don't get me wrong - I don't object the patch, it's ok to me, but
>>>>> why not simplify at this moment and just use for(...) a[i] = b[i]?
>>>>
>>>> Well, I don't like the complexity of arm64 memcpy() code either,
>>>> but I am not an expert so I can't complain.
>>>>
>>>> Now, the reason why asm is used is described in commit 214e280
>>>> ("pie: provide own memcpy for x86"), to which I referred in my commit.
>>>> Let me quote:
>>>>
>>>>> Another argument in favor of (3) is we already have implementation of
>>>>> builtin_memcpy() optimized for x86.
>>>>>
>>>>> The only problem is it is not named memcpy(). Using assembler file
>>>>> (.S)
>>>>> we can have a function with two names (entry points).
>>>>
>>>> It's just I don't know a way to achieve the same effect in C.
>>>
>>> Well, currently builtin_* functions are always_inline, so if just add
>>> i.e. memcmp(...) { return builtin_memcmp(...); } that will be like a
>>> synonym.
>>>
>>> For the other choice: de-inline those functions and customize PIE's
>>> linker script, somehow like:
>>>
>>> > --- a/criu/pie/pie-reloc.lds.S.in
>>> > +++ b/criu/pie/pie-reloc.lds.S.in
>>> > @@ -3,6 +3,10 @@ SECTIONS
>>> >         .text : {
>>> >                 *(.head.text)
>>> >                 *(.text*)
>>> > +               memcmp = builtin_memcmp ;
>>> > +               memcpy = builtin_memcpy ;
>>> > +               memset = builtin_memset ;
>>> > +               strncmp = builtin_strncmp ;
>>> >                 *(.compel.init)
>>> >                 *(.compel.exit)
>>>
>>> So it'll be the real synonym.
>>>
>>> The only difference will be if this is a function for compiler-inserted
>>> calls or for all calls.
>>
>> That's an interesting approach indeed, unfortunately I failed to make
>> it work, it always complains like this:
>>
>> ld -r -T criu/pie/pie.lds-native.S -o criu/pie/restorer.built-in.bin.o
>> criu/pie/restorer.built-in.o criu/pie/native.lib.a
>> criu/pie/pie.lds-native.S:8: undefined symbol `builtin_memcmp'
>> referenced in expression
>>
>> Andrey suggested using alias function attribute. I will play with it
>> and let you know.
>
> OK, since your suggestion to patch pie linker script didn't work for me,
> I tried using alias as suggested by Andrei.
>
> This almost worked! The idea is to add weak aliases, like this (see
> patch 0002* for the complete code):
>
> +#ifdef CR_NOGLIBC
> +void *memcpy(void *to, const void *from, size_t n)     \
> +            __attribute__ ((weak, alias ("builtin_memcpy")));
> +#endif
>
> The problem is, if system's memcpy/memcmp/memset have different prototypes,
> this approach fails. This happens with Alpine Linux musl:
>
>> /usr/include/fortify/string.h:37:27: error: redefinition of 'memcpy'
>     >  _FORTIFY_FN(memcpy) void *memcpy(void *__od, const void *__os,
> size_t
>     > __n)
>     >                            ^
>     > In file included from
>     > compel/plugins/include/uapi/std/syscall-types.h:13:0,
>     >                  from compel/plugins/include/uapi/std/syscall-64.h:5,
>     >                  from compel/plugins/include/uapi/std/syscall.h:8,
>     >                  from criu/pie/parasite.c:11:
>     > /usr/include/sched.h:72:7: note: previous definition of 'memcpy' was
>     > here
>     >  void *memcpy(void *__restrict, const void *__restrict, size_t);
>     >        ^
>
> Now, I can work around this by adding -U_FORTIFY_SOURCE to CFLAGS
> (see patch 0003*), but I'm afraid this all is going too far.
>
> You can see it for yourself, patches are attached (for reference only,
> not for inclusion).
> Note I also tried to remove the in-arch optimized implementations of
> mem*() functions
> to make sure the generic code is at least compilable on all platforms.
>
> I still think it's better to just provide optimized code for different
> arches, it's
> almost zero maintenance. Let me know what you think.

Hi Kir,

good work!
What I think:
we definitely should apply 1/5,
the 2/5, 3/5 also make sence and for those patches
Reviewed-by: Dmitry Safonov <dsafonov at virtuozzo.com>

Ok, so you think, it's better to provide optimized code for arm/aarch64
- then I suggest to do it on the top of those 3 patches:
so that would be easy to turn off, if we spot some problem with 
optimized versions by undefining HAS_BUILTIN_*.

Otherwise, if we don't have those three patches, generic version would
not make sence, as we have arch-optimized versions for each supported
platform and generic version doesn't work in some circumstances (with
clang currently).

So, there are two ways:
1. to remove completely generic version (as we have per-arch optimized)
2. to leave generic version working - if we spot any problem with
optimized versions or will need to add a new platform.

I suggest to do (2), while it does mean that we'll have generic version
always undefined - which means that we keep it as dead-code, but still
that's just a for-loop and I prefer to do this.

Cyrill, what do you think?

>>>>> I mean, if anything got broken in this asm (by any reason, you could
>>>>> see the history on the source link had some fixes already) - the
>>>>> people
>>>>> start comming with "help me, criu segfaults/corrupts memory/silently
>>>>> dies on arm64", which in my POV is harder to fix than for() loop,
>>>>> which I'm sure is rightly handled by any compiler.
>>>>>
>>>>> So, again I don't object, I'm ok if the patch applied - just don't get
>>>>> a reason for it.
>>>
>>
>


-- 
              Dmitry


More information about the CRIU mailing list