[CRIU] remap_dead_pid test hang?

Pavel Emelyanov xemul at parallels.com
Fri Mar 20 01:16:57 PDT 2015


On 03/20/2015 01:20 AM, Tycho Andersen wrote:
> On Fri, Mar 20, 2015 at 12:59:43AM +0300, Pavel Emelyanov wrote:
>> On 03/20/2015 12:42 AM, Tycho Andersen wrote:
>>> On Thu, Mar 19, 2015 at 10:05:09PM +0300, Pavel Emelyanov wrote:
>>>> On 03/19/2015 10:01 PM, Pavel Emelyanov wrote:
>>>>> On 03/19/2015 09:54 PM, Tycho Andersen wrote:
>>>>>> On Thu, Mar 19, 2015 at 07:36:41PM +0300, Pavel Emelyanov wrote:
>>>>>>> On 03/19/2015 05:54 PM, Tycho Andersen wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> While testing the cgroup property restore patch, I noticed that the
>>>>>>>> remap_dead_pid test seems to hang both with my patch and the current
>>>>>>>> master. Has anyone else noticed this?
>>>>>>>
>>>>>>> No. We have Jenkins running full zdtm suite (and more) in 7x24 manner
>>>>>>> and though it fails sometimes, but we haven't met any issues with this
>>>>>>> particular test.
>>>>>>>
>>>>>>> Does the test hang by itself, or after restore?
>>>>>>
>>>>>> How do I tell? It looks like it's just hung:
>>>>>>
>>>>>> root     32749  0.0  0.1  11652  3292 pts/1    S    18:49   0:00 /bin/bash ./zdtm.sh --ct -r static/remap_dead_pid
>>>>>> root       314  0.0  0.0   4196   644 pts/1    S    18:49   0:00  \_ ./zdtm_ct ./zdtm.sh -r static/remap_dead_pid
>>>>>> root       370  0.0  0.1  11752  3476 ?        Ss   18:49   0:00      \_ /bin/bash ./zdtm.sh -r static/remap_dead_pid
>>>>>> root     15963  0.0  0.1  11752  2372 ?        S    18:49   0:00          \_ /bin/bash ./zdtm.sh -r static/remap_dead_pid
>>>>>> root     15964  0.0  0.1   8464  2332 ?        S    18:49   0:00              \_ make -C zdtm/live/static remap_dead_pid.pid
>>>>>> 18943    17937  0.0  0.0   4492   816 ?        S    18:49   0:00                  \_ ./remap_dead_pid --pidfile=remap_dead_pid.pid --outfile=remap_dead_pid.out
>>>>>> 18943    17940  0.0  0.0   4492   104 ?        Ss   18:49   0:00                      \_ ./remap_dead_pid --pidfile=remap_dead_pid.pid --outfile=remap_dead_pid.out
>>>>>> 18943    17941  0.0  0.0   4492  1136 ?        S    18:49   0:00                          \_ ./remap_dead_pid --pidfile=remap_dead_pid.pid --outfile=remap_dead_pid.out
>>>>>
>>>>> Hm... :) There should be a directory where criu is supposed to put it's output
>>>>> (logs and images), the test/dump/ one if I'm not mistaken. Then you can check
>>>>> the remap_dead_pid's directory to container dump.log or restore.log to find
>>>>> out whether any of those actions took place.
>>>>>
>>>>
>>>> Ah, no, what am I telling?! Sorry, of course there hasn't been any dumps
>>>> yet, the make $test.pid is still there. And it's not finished because the
>>>> process 816 hasn't exited. Yet. Can you strace them all to check what are
>>>> they currently doing?
>>>
>>> Yes, 17940 was in waitpid4, and 17941 was in nanosleep. I guess that
>>> means the kill() in that test failed somehow? 
>>
>> Looks like yes. What bothers me is that the task is killed with SIGINT,
>> not SIGKILL. Is this signal blocked or ignored by 17941? Can you check
>> it in /proc/pid/status?
> 
> Doesn't look like it,
> 
> http://paste.ubuntu.com/10630380/

Indeed. Neither mask contains INT :(

> However, when I switch it to SIGKILL it does seem to succeed every
> time for me.

Wow... I wonder how could this happen that SIGINT sometimes doesn't kill
a task. Can you debug this a little bit more -- what does the kill()
syscall return? Does it succeed at all?

> Tycho

Thanks,
Pavel



More information about the CRIU mailing list