[CRIU] weird usleep() behaviour on dump

Fri Jun 14 16:45:29 MSK 2019

HI,

On 6/14/19 10:06 AM, Radostin Stoyanov wrote:
> On 13/06/2019 19:35, Dmitry Safonov wrote:
>> On 6/13/19 7:27 PM, Andrei Vagin wrote:
>>> Cc: Dima
>>>
>>> On Thu, Jun 13, 2019 at 6:10 AM Radostin Stoyanov <rstoyanov1 at gmail.com> wrote:
>>>> Hi Andi,
>>>>
>>>> On 13/06/2019 13:31, andi wrote:
>>>>> Hi guys!
>>>>>
>>>>> I’m running CRIU 3.11 on a 4.9 ARM64 Kernel.
>>>>>
>>>>>
>>>>> I’m observing the following. If I have a simple C-Program that sleeps
>>>>> for 5 seconds (usleep(5000000)), and if I dump it after 3 seconds,
>>>>> then the sleep will take 8 seconds. If I dump it periodically every
>>>>> second, the usleep() will never return, and from the moment I stop
>>>>> dumping until it returns it will take 5 seconds.
>>> Could you run strace -fo strace.log -s 256 criu restore .... and send
>>> us the strace.log file?

sure I can (see below):

24085 write(4, "(00.050290) Writing stats\n", 26) = 26
24085 openat(AT_FDCWD, "stats-restore", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0644) = 10
24085 write(10, "@Y\20U\0063\tW\17\0\0\0\22\r\10\0\20\0\30\205\2 \215\370\2(\16", 27) = 27
24085 close(10)                         = 0
24085 gettimeofday({tv_sec=1560518843, tv_usec=670867}, NULL) = 0
24085 write(4, "(00.050610) Running post-resume scripts\n", 40) = 40
24085 wait4(-1,  <unfinished ...>
24059 <... futex resumed> )             = 1
24059 close(11)                         = 0
24059 close(255)                        = 0
24059 munmap(0x127000, 20480)           = 0
24059 rt_sigreturn({mask=[]})           = -6267096
24059 nanosleep({tv_sec=50, tv_nsec=0}, NULL) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
24059 --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
24059 +++ killed by SIGINT +++
24085 <... wait4 resumed> NULL, 0, NULL) = 24059
24085 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=24059, si_uid=0, si_status=SIGINT, si_utime=0, si_stime=0} ---
24085 exit_group(0)                     = ?
24085 +++ exited with 0 +++

BUT that is not the problem I am experiencing -- I am dumping periodically and my application is blocking during DUMP.
I haven't had any problems with restore so far (at least not in regard to sleeps).

>>>
>>>>>
>>>>> So it looks as if usleep() got restarted with every dump.
>>>>>
>>>>> Do you guys have an idea why that is? Where it happens?
>>>>>
>>>> It is likely that usleep() is interrupted by CRIU and if it is called
>>>> within a loop it will start again.
>>> Why do we have some logic about restarting system calls for x86 and
>>> don't have something similar for aarch64?
>>>
>>> https://github.com/checkpoint-restore/criu/blob/criu-dev/compel/arch/x86/src/lib/infect.c#L270
>> Probably, because zdtm misses a test.. ;-)
>>
>
> I'm not sure about this. I tested the following example:
>
> #include <stdio.h>
> #include <unistd.h>
> #include <errno.h>
> #include <string.h>
>
> int main()
> {
>     int err, ret;
>     ret = usleep(5000000);
>     err = errno;
>     printf("%d %s\n", ret, strerror(err));
>     return 0;
> }
>
> After restore, the output is:
> -1 Interrupted system call
>
> When the --leave-running flag is used, the same output appears immediately after dump.
nope, that is not what I  experience on aarch64. I tried your test app as well, the one was using since yesterday looks similar, just calling the sleep in a loop:

#include<stdio.h>
#include<unistd.h>

int main(int argc, char **argv)
{
         int ret, err;

         printf("Hello!\n");

         while (1) {
                 ret = usleep(1000000);
                 if (ret) {
                     err = errno;
                     printf("%d %s\n", ret, strerror(err));
                 }
                 printf(".");
                 fflush(stdout);
         }
         printf("Bye!\n");
}

I execute it as follows:

# ./test_usleep 2> /tmp/junk > /tmp/junk1 < /dev/null

and on another console dump in a loop:

# while [ 1 ]; do criu dump -t 24069 -D /run/criu/d/ -j  -vvvv --leave-running; usleep 300; done

then the char count in /tmp/junk1 stops increasing until I kill the while [ 1 ] loop.

I dug a little deeper yesterday, and found out that the issue only happens when using relative timers, so the
solution that works for me so far is a pre-load library that converts sleep(), usleep(), nanosleep and
clock_nanosleep() with relative times into absolute timers.

Using that pre-load lib and the above example results in the dot count happily increasing even when my dump loop is running.

>
> I think that this is expected behaviour and applications that use usleep() should handle EINTR.
>
> Also usleep() is not a system call, and in the POSIX specification [1] is explicitly said:

jup, sleep() and usleep() are implemented using a nanosleep() syscall.

thx!
andi

>
> The usleep() function shall cause the calling thread to be suspended from execution until either the number of realtime microseconds specified by the argument useconds has elapsed or a signal is delivered to the calling thread ...
>
> [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/usleep.html
>
> Radostin