[Users] SATA HDD Problem

Vasily Averin vvs at sw.ru
Mon Jul 16 22:50:50 EDT 2007


Markus, Kirill

About http://bugzilla.kernel.org/show_bug.cgi?id=8650
We have not one but at least 3 different problems here:
1) interrupts-related issue on VIA hardware.

Comment  #27 From Tejun Heo  2007-07-10 09:09:09:
2. The PATA and SATA controllers share a PCI IRQ line.  The PATA
controller also seem to be hardwired to 14 or 14/15 depending on
controller mode.  The driver/ide drivers use IRQ auto-detection and
detect 14 for the PATA part while libata honors pdev->irq and use 20.
The end result is the same tho.  One of the two hosts lose ability to
assert IRQ and everything falls down.  It's definitely related to IRQ
routing and is really peculiar.  Well, I wouldn't expect anything less
from the vendor. :-) Does "acpi=noirq" make any difference?

VvS: I would note that using libata drivers (instead ide) for PATA
controller makes the situation much better, but unfortunately do not
closes this issue completely: with using libata driver I've reproduced
this issue, however it was only once. Also I would add that I still
cannot reproduce this issue with "acpi=noirq".

2) Another issue is infinite Error Handling resets for ide-attached
DVD-ROM. It is unpleasantly too because of it generates tonns of garbage
messages in the system logs, however it brokes nothing on my node and
therefore it have low severity for me.

3) ext3/jbd-related issue:
AIM7 test leads to the ext3/jbd lockup on 2.6.22-rc4 and -rc7 kenrels.
However it looks like this issue is go away: I've updated the kernel up
to 2.6.22 and still cannot reproduce it since Jul 12.

I know nothing about 2.6.9-based kernels. IMHO interrupt-related issue
should be present on this kernels too, however I never saw bugreports
until we have upgraded to 2.6.18 kernels.

Also I would note that all 3 issues are not Virtuozzo-specific and any
new bugreports should be addressed to libata or ext3 developers but not
to me, I'm just a tester in this situation.

Markus, your situation is not clear for me. I even cannot confirm that
you have the same issue as I've observed. At the first glance all issues
looks similar but can have the different reasons.

I do not know all details of your situation and may be wrong. However
IMHO "device error" messages in your logs points to some disk drive
failure. I would note that in my case this message was "timeout" and
from my point of view it is important difference.

I would like to recommend you find the way to reproduce issue on your
node, collect all information described your situation (all kernel
messages beginning from node booting, lspci -vvvxxx output, probably
something else) and send bugreport to libata developers.

If you don't want to investigate this bug -- I recommend you try to
replace your hardware, beginning at disk drive.

Thank you,
	Vasily Averin



More information about the Users mailing list