[CRIU] remap_dead_pid test hang?

Pavel Emelyanov xemul at parallels.com
Thu Mar 19 14:59:43 PDT 2015


On 03/20/2015 12:42 AM, Tycho Andersen wrote:
> On Thu, Mar 19, 2015 at 10:05:09PM +0300, Pavel Emelyanov wrote:
>> On 03/19/2015 10:01 PM, Pavel Emelyanov wrote:
>>> On 03/19/2015 09:54 PM, Tycho Andersen wrote:
>>>> On Thu, Mar 19, 2015 at 07:36:41PM +0300, Pavel Emelyanov wrote:
>>>>> On 03/19/2015 05:54 PM, Tycho Andersen wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> While testing the cgroup property restore patch, I noticed that the
>>>>>> remap_dead_pid test seems to hang both with my patch and the current
>>>>>> master. Has anyone else noticed this?
>>>>>
>>>>> No. We have Jenkins running full zdtm suite (and more) in 7x24 manner
>>>>> and though it fails sometimes, but we haven't met any issues with this
>>>>> particular test.
>>>>>
>>>>> Does the test hang by itself, or after restore?
>>>>
>>>> How do I tell? It looks like it's just hung:
>>>>
>>>> root     32749  0.0  0.1  11652  3292 pts/1    S    18:49   0:00 /bin/bash ./zdtm.sh --ct -r static/remap_dead_pid
>>>> root       314  0.0  0.0   4196   644 pts/1    S    18:49   0:00  \_ ./zdtm_ct ./zdtm.sh -r static/remap_dead_pid
>>>> root       370  0.0  0.1  11752  3476 ?        Ss   18:49   0:00      \_ /bin/bash ./zdtm.sh -r static/remap_dead_pid
>>>> root     15963  0.0  0.1  11752  2372 ?        S    18:49   0:00          \_ /bin/bash ./zdtm.sh -r static/remap_dead_pid
>>>> root     15964  0.0  0.1   8464  2332 ?        S    18:49   0:00              \_ make -C zdtm/live/static remap_dead_pid.pid
>>>> 18943    17937  0.0  0.0   4492   816 ?        S    18:49   0:00                  \_ ./remap_dead_pid --pidfile=remap_dead_pid.pid --outfile=remap_dead_pid.out
>>>> 18943    17940  0.0  0.0   4492   104 ?        Ss   18:49   0:00                      \_ ./remap_dead_pid --pidfile=remap_dead_pid.pid --outfile=remap_dead_pid.out
>>>> 18943    17941  0.0  0.0   4492  1136 ?        S    18:49   0:00                          \_ ./remap_dead_pid --pidfile=remap_dead_pid.pid --outfile=remap_dead_pid.out
>>>
>>> Hm... :) There should be a directory where criu is supposed to put it's output
>>> (logs and images), the test/dump/ one if I'm not mistaken. Then you can check
>>> the remap_dead_pid's directory to container dump.log or restore.log to find
>>> out whether any of those actions took place.
>>>
>>
>> Ah, no, what am I telling?! Sorry, of course there hasn't been any dumps
>> yet, the make $test.pid is still there. And it's not finished because the
>> process 816 hasn't exited. Yet. Can you strace them all to check what are
>> they currently doing?
> 
> Yes, 17940 was in waitpid4, and 17941 was in nanosleep. I guess that
> means the kill() in that test failed somehow? 

Looks like yes. What bothers me is that the task is killed with SIGINT,
not SIGKILL. Is this signal blocked or ignored by 17941? Can you check
it in /proc/pid/status?

> Is there a race there that I am missing?

Shouldn't be. The code is pretty straightforward -- fork, kill then wait.

Thanks,
Pavel



More information about the CRIU mailing list