[Devel] Re: [C/R] sleepers don't wake up on restart

Oren Laadan orenl at cs.columbia.edu
Wed Apr 29 14:45:32 PDT 2009


Hi,

Sukadev Bhattiprolu wrote:
> Oren Laadan [orenl at cs.columbia.edu] wrote:
> | 
> | I just posted v14-rc3 which includes the c/r of restart-blocks.
> | That should improve the situation.
> | 
> | However, depending on which syscalls one uses, process may still
> | seem "stuck" after restart because the current code still does
> | not save signals nor task timers; If a signal was pending (SIGALRM
> | for example) after freezing but before checkpoint, it will be lost.
> | If a timer was set at checkpoint, it will not be restored.
> | 
> | So depending on your program, you may still experience issues
> | until I add patches to handle that.
> 
> Ok, Just an fyi, the original program seemed to work fine, but when
> I try to restart a small process tree, I get stuck on restart again.
> 
> I am running on v14-rc3 branch. Has this got anything to do with
> pending SIGCHLD ? Seems to be easier to repro with larger process
> trees (2 children per process, 4 or more levels deep).

Could be. You can verify by adding a couple of lines of code to
the checkpoint to complain if there are signals pending on a task
that is being checkpointed.

BTW, current code disregards Zombie processes.

Support for both (signals and zombies) is in the queue.

Oren.

> 
> Test programs (attached) (they need some cleanup though)
> 
> 	ptree2.c
> 	p2.loop
> 
> --------- Processes after restart:
> 
> $ ps -ef|grep ptree
> 
> root     10461 10459  0 22:07 pts/0    00:00:00 ./ptree2 -n 1 -d 2
> root     10465 10461  0 22:07 pts/0    00:00:00 ./ptree2 -n 1 -d 2
> root     10466 10465  0 22:07 pts/0    00:00:00 [ptree2] <defunct>
> root     10479  8220  0 22:09 pts/1    00:00:00 grep ptree
> 
> ---------- Process stacks
> 
> tree2        S f6270a90     0 10461  10459
>  f5e59380 00000082 08048a86 f6270a90 f6270bfc c2b32260 00000000 0000d9d3
>  f5f423b0 00000000 ffffffff 00000000 00000000 00000001 00000000 f6270a88
>  00000000 f6270a90 00000000 c02243aa 00000004 00000003 0000000c 00000006
> Call Trace:
>  [<c02243aa>] do_wait+0x1dd/0x2f6
>  [<c021cd14>] default_wake_function+0x0/0x8
>  [<c0224542>] sys_wait4+0x7f/0x92
>  [<c0224568>] sys_waitpid+0x13/0x17
>  [<c0202ce5>] sysenter_do_call+0x12/0x25
>  [<c0510000>] rtl8139_init_one+0x5ae/0x887
> ptree2        S f5f423b0     0 10465  10461
>  f6002180 00000082 c2b265c8 f5f423b0 f5f4251c c2b29260 f67b1f44 e06d0177
>  00000282 c023363c c2b265c8 00000000 00000282 0000c350 00000001 0000c350
>  00000001 f67b1f44 0000c350 c051be99 00000000 00000001 0000c350 bf9d0e04
> Call Trace:
>  [<c023363c>] hrtimer_start_range_ns+0x105/0x111
>  [<c051be99>] do_nanosleep+0x54/0x8c
>  [<c02336d7>] hrtimer_nanosleep+0x8f/0xee
>  [<c02332b8>] hrtimer_wakeup+0x0/0x18
>  [<c051be7f>] do_nanosleep+0x3a/0x8c
>  [<c0233777>] sys_nanosleep+0x41/0x51
>  [<c0202ce5>] sysenter_do_call+0x12/0x25
> ptree2        ? f6bee040     0 10466  10465
>  f638cb80 00000046 00200200 f6bee040 f6bee1ac c2b17260 f6bee038 0000dd77
>  00000000 c022f576 ffffffff 00000303 00000000 00000001 00000000 00000012
>  f5a61e84 f6bee040 f6bee038 c0224c29 f6270a90 00000001 f6bee038 f5a61f88
> Call Trace:
>  [<c022f576>] wakeme_after_rcu+0x0/0x8
>  [<c0224c29>] do_exit+0x638/0x63c
>  [<c0224c87>] do_group_exit+0x5a/0x83
>  [<c0224cbd>] sys_exit_group+0xd/0x10
>  [<c0202ce5>] sysenter_do_call+0x12/0x25
> 
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers




More information about the Devel mailing list