[CRIU] zdtm/static/fd failure on aarch64
Adrian Reber
adrian at lisas.de
Mon Jun 11 21:42:41 MSK 2018
On Mon, Jun 11, 2018 at 05:59:51PM +0100, Dmitry Safonov wrote:
> 2018-06-11 16:55 GMT+01:00 Adrian Reber <adrian at lisas.de>:
> > On Mon, Jun 11, 2018 at 04:41:17PM +0100, Dmitry Safonov wrote:
> >> 2018-06-11 16:34 GMT+01:00 Adrian Reber <adrian at lisas.de>:
> >> [...]
> >> > No, still endless loop:
> >> >
> >> > 11:32:31.668: 4109: ERR: ../lib/lock.h:148: futex (errno = 11 (Resource temporarily unavailable))
> >> >
> >> > [pid 4109] futex(0xffff87000000, FUTEX_WAIT, 2264924161, NULL) = -1 EAGAIN (Resource temporarily unavailable)
> >> > [pid 4109] newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0
> >> > [pid 4109] write(2, "11:32:32.516: 4109: ERR: ../lib"..., 99) = 99
> >>
> >> Ugh, I adore this piece of code.
> >> The third time, lucky time? %)
> >
> > :( Sorry. Still:
> >
> > 11:54:16.950: 4210: ERR: ../lib/lock.h:148: futex (errno = 11 (Resource temporarily unavailable))
> >
> > [pid 4210] futex(0xffff9f650000, FUTEX_WAIT, 2674196480, NULL) = -1 EAGAIN (Resource temporarily unavailable)
> > [pid 4210] newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0
> > [pid 4210] write(2, "11:54:16.950: 4210: ERR: ../lib"..., 99) = 99
>
> Heh, could you attach 20-30 lines after the first
> futex(..., FUTEX_WAIT, 2,...) call in the endless loop?
4508 dup3(4, 0, 0) = 0
4508 close(4) = 0
4508 mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, 0, 0) = 0xffffa14a0000
4508 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xffffa1502dd0) = 4509
4508 futex(0xffffa14a0000, FUTEX_WAIT, 0, NULL <unfinished ...>
4509 setsid() = 4509
4509 rt_sigaction(SIGCHLD, {SIG_DFL, [], SA_RESTART}, NULL, 8) = 0
4509 mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0) = 0xffffa1300000
4509 futex(0xffffa1300000, FUTEX_WAIT, 2704277504, NULL) = -1 EAGAIN (Resource temporarily unavailable)
4509 brk(NULL) = 0x1f8d0000
4509 brk(0x1f900000) = 0x1f900000
4509 brk(NULL) = 0x1f900000
4509 openat(AT_FDCWD, "/etc/localtime", O_RDONLY|O_CLOEXEC) = 4
4509 fstat(4, {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
4509 fstat(4, {st_mode=S_IFREG|0644, st_size=3519, ...}) = 0
4509 mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffffa12f0000
4509 read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\0"..., 8192) = 3519
4509 lseek(4, -2252, SEEK_CUR) = 1267
4509 read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\5\0\0\0\0"..., 8192) = 2252
4509 close(4) = 0
4509 munmap(0xffffa12f0000, 65536) = 0
4509 write(2, "14:41:27.635: 4509: ERR: ../lib"..., 99) = 99
4509 futex(0xffffa1300000, FUTEX_WAIT, 2704277504, NULL) = -1 EAGAIN (Resource temporarily unavailable)
4509 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0
4509 write(2, "14:41:27.636: 4509: ERR: ../lib"..., 99) = 99
4509 futex(0xffffa1300000, FUTEX_WAIT, 2704277504, NULL) = -1 EAGAIN (Resource temporarily unavailable)
4509 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0644, st_size=3519, ...}, 0) = 0
4509 write(2, "14:41:27.636: 4509: ERR: ../lib"..., 99) = 99
> I do see that there is a possible race like this:
>
> futex val CPU0/task0 CPU1/task1
> --------------- ---------------
> 1 atomic_inc()=2 atomic_inc()=3
> 3 futex(FUTEX_WAIT, 2)
> 3 EWOULDBLOCK
> 3 atomic_inc()=4
> 4
> futex(FUTEX_WAIT, 3)
> 4 EWOULDBLOCK
> ...
>
> But it looks quite unlikely to reproduce each cycle and I'm not sure
> it's the problem you're observing.
>
> Thanks,
> Dmitry
Adrian
More information about the CRIU
mailing list