[CRIU] [PATCHv7 18/33] lib/vdso: Add unlikely() hint into vdso_read_begin()

Vincenzo Frascino vincenzo.frascino at arm.com
Thu Oct 24 12:30:20 MSK 2019


Hi Andrei,

On 10/24/19 7:13 AM, Andrei Vagin wrote:
> On Wed, Oct 16, 2019 at 12:24:14PM +0100, Vincenzo Frascino wrote:
>> On 10/11/19 2:23 AM, Dmitry Safonov wrote:
>>> From: Andrei Vagin <avagin at gmail.com>
>>>
>>> Place the branch with no concurrent write before contended case.
>>>
>>> Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
>>> (more clock_gettime() cycles - the better):
>>>         | before    | after
>>> -----------------------------------
>>>         | 150252214 | 153242367
>>>         | 150301112 | 153324800
>>>         | 150392773 | 153125401
>>>         | 150373957 | 153399355
>>>         | 150303157 | 153489417
>>>         | 150365237 | 153494270
>>> -----------------------------------
>>> avg     | 150331408 | 153345935
>>> diff %  | 2	    | 0
>>> -----------------------------------
>>> stdev % | 0.3	    | 0.1
>>>
>>> Signed-off-by: Andrei Vagin <avagin at gmail.com>
>>> Co-developed-by: Dmitry Safonov <dima at arista.com>
>>> Signed-off-by: Dmitry Safonov <dima at arista.com>
>>
>> Reviewed-by: Vincenzo Frascino <vincenzo.frascino at arm.com>
>> Tested-by: Vincenzo Frascino <vincenzo.frascino at arm.com>
> 
> Hello Vincenzo,
> 
> Could you test the attached patch on aarch64? On x86, it gives about 9%
> performance improvement for CLOCK_MONOTONIC and CLOCK_BOOTTIME.
> 

I did run similar tests in past with a previous version of the unified vDSO
library and what I can tell based on the results of those is that the impact of
"__always_inline" alone was around 7% on arm64, in fact I had a comment stating
"To improve performances, in this file, __always_inline it is used for the
functions called multiple times." in my implementation [1].

[1] https://bit.ly/2W9zMxB

I spent some time yesterday trying to dig out why the approach did not make the
cut but I could not infer it from the review process.

> Here is my test:
> https://github.com/avagin/vdso-perf
> 
> It is calling clock_gettime() in a loop for three seconds and then
> reports a number of iterations.
> 

I am happy to run the test on arm64 and provide some results.

> Thanks,
> Andrei
> 

-- 
Regards,
Vincenzo


More information about the CRIU mailing list