[Users] Re: weird filesystem corruption issues

Aleksandar Ivanisevic aleksandar at ivanisevic.de
Wed Oct 10 05:06:00 EDT 2012


Kirill Korotaev <dev at parallels.com> writes:

Hi,

I might be able to provide access if you wish. Here are some more
facts:

Once i reboot the VE the issue goes away for a few days, then it
happens again. The issue with gzip is just one manifestation, other
symptoms include files appearing corrupted that are fine again after
the reboot (something with the disk cache perhaps?), weird segfaults
and bus errors for various scripts and apps and similar things. We
also used to have weird mysql crashes (datafile checksum corruption)
in another VE running on the same HN that always stopped once we
migrated away, but this has stopped happening in the last few kernels.

Only this one VE exhibits this behaviour, out of 50 others running on
18 HNs with more or less identical hardware.

VE is migrated regularly to other nodes (both offline and online)
during maintenance and it has this problem on all HNs it has been
running on.

Nodes are IBM xSeries servers with ECC memory, so I don't think its a
physical memory issue.

VE is running Nagios/cfengine/syslogd server, it is fairly loaded, but
it is not the most loaded VE in our environment.

most of the limits are set to the max, failcnts are always zero.

Any pointers in where should I look are appreciated.

regards,

> can you provide access and demonstrate this on the real node?
> The only guess I have is that some application changes your files in /tmp or you have memory corruptions, so memtest86 is recommended to run anyway.
>
> Thanks,
> Kirill
>
>
> On Oct 9, 2012, at 16:34 , Aleksandar Ivanisevic <aleksandar at ivanisevic.de> wrote:
>
>> 
>> Hi,
>> 
>> please help me debug this weird issue. This has been happening
>> occasionally in my setup for literally years, on at least 10 different
>> OVZ kernels.
>> 
>> in VE:
>> 
>> # md5sum /tmp/application.log.backup 
>> 89024ce67704e3cf2aa9e7b2e2584a60  /tmp/application.log.backup
>> 
>> # gzip > application.log.backup.gz < /tmp/application.log.backup 
>> # zcat application.log.backup.gz | md5sum
>> 
>> zcat: application.log.backup.gz: unexpected end of file
>> 986389b791ee94692da36a56be29392a  -
>> 
>> but the next attempt 10 seconds later:
>> 
>> # gzip > application.log.backup.gz < /tmp/application.log.backup 
>> # /var/log zcat application.log.backup.gz | md5sum
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 
>> the file is truncated at the random place.
>> 
>> I can reliably reproduce this by running this in a loop:
>> 
>> root@ /var/log while true; do gzip > application.log.backup.gz < /tmp/application.log.backup ; zcat application.log.backup.gz | md5sum; done
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 
>> zcat: application.log.backup.gz: unexpected end of file
>> ad830a43ccf4641afc2c0dfd42b3d5b8  -
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 
>> zcat: application.log.backup.gz: unexpected end of file
>> a35d71d503b3cfc249409075afd9295f  -
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 
>> 
>> But when I run it from the HN, there is never any issue
>> 
>> 
>> # while true; do gzip > application.log.backup.gz < /vz/private/1090/tmp/application.log.backup ; zcat application.log.backup.gz | md5sum; done
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 89024ce67704e3cf2aa9e7b2e2584a60  -
>> 
>> 
>> 
>> _______________________________________________
>> Users mailing list
>> Users at openvz.org
>> https://openvz.org/mailman/listinfo/users

-- 
Ti si arogantan, prepotentan i peglaš vlastitu frustraciju. -- Ivan
Tišljar, hr.comp.os.linux



More information about the Users mailing list