[Users] Issues after updating to 7.0.14 (136)

Kevin Drysdale kevin.drysdale at iomart.com
Tue Jul 7 12:42:37 MSK 2020


Hello,

Thanks to all who have replied to this thread so far - my apologies for 
taking so long to get back to you all.

In terms of where I'm seeing the EXT4 errors, they are showing up in the 
kernel log on the node itself, so the output of 'dmesg' is regularly 
seeing entries such as these:

[375095.199203] EXT4-fs (ploop43209p1): Remounting filesystem read-only
[375095.199267] EXT4-fs error (device ploop43209p1) in ext4_ext_remove_space:3073: IO failure
[375095.199400] EXT4-fs error (device ploop43209p1) in ext4_ext_truncate:4692: IO failure
[375095.199517] EXT4-fs error (device ploop43209p1) in ext4_reserve_inode_write:5358: Journal has aborted
[375095.199637] EXT4-fs error (device ploop43209p1) in ext4_truncate:4145: Journal has aborted
[375095.199779] EXT4-fs error (device ploop43209p1) in ext4_reserve_inode_write:5358: Journal has aborted
[375095.199957] EXT4-fs error (device ploop43209p1) in ext4_orphan_del:2731: Journal has aborted
[375095.200138] EXT4-fs error (device ploop43209p1) in ext4_reserve_inode_write:5358: Journal has aborted
[461642.709690] EXT4-fs (ploop43209p1): error count since last fsck: 8
[461642.709702] EXT4-fs (ploop43209p1): initial error at time 1593576601: ext4_ext_remove_space:3000: inode 136354
[461642.709708] EXT4-fs (ploop43209p1): last error at time 1593576601: ext4_reserve_inode_write:5358: inode 136354

Inside the container itself, not much is being logged, since the affected 
container in in this particular instance is indeed (as per the errors 
above) mounted read-only due to the errors its root.hdd filesystem is 
experiencing.

Having dug a bit more into what happened here, I suspect that this 
corruption may have come about when the containers were being moved either 
to or from the standby node and the live node, but I can't be 100% sure of 
that.

The picture is further muddied in that the standby node (the node that we 
used for evacuating containers from the node to be updated) was itself 
initially updated to 7.0.14 (135).  However, the live node (which was 
updated a short time after the standby node) appears to have got 7.0.14 
(136).  So I don't know if the issue was in fact with 7.0.14 (135) (which 
was on the standby node, where the containers would have been moved to, 
and moved back from), or on 7.0.14 (136) on the live node.  Were there any 
known issues with 7.0.14 (135) that might correlate with what I'm seeing 
above ?

Anyway, once again, thanks to everyone who has replied so far.  If anyone 
has any further questions or would like any further information, please 
let me know and I will be happy to assist.

Thank you,
Kevin Drysdale.


On Thu, 2 Jul 2020, Jehan PROCACCIA wrote:

> yes , you are right, I do get the same virtuozzo-release  as mentioned in the initial subject, sorry for the noise .
> 
> # cat /etc/virtuozzo-release
> OpenVZ release 7.0.14 (136)
> 
> but anyway, I don't see any ploop / fsck error in the host /var/log/vzctl.log
> inside the CT , where did you see those errors ?
> 
> Jehan .
> 
> _____________________________________________________________________________________________________________________________________________________
> De: "jjs - mainphrame" <jjs at mainphrame.com>
> À: "OpenVZ users" <users at openvz.org>
> Envoyé: Jeudi 2 Juillet 2020 19:33:23
> Objet: Re: [Users] Issues after updating to 7.0.14 (136)
> 
> Thanks for that sanity check, the conundrum is resolved. vzlinux-release and virtuozzo-release are indeed different things.
> Jake
> 
> On Thu, Jul 2, 2020 at 10:27 AM Jonathan Wright <jonathan at knownhost.com> wrote:
>
>       /etc/redhat-release and /etc/virtuozzo-release are two different things.
>
>       On 7/2/20 12:16 PM, jjs - mainphrame wrote:
>       Jehan - 
>
>       I get the same output here -
>
>       [root at annie ~]# yum repolist  |grep virt
>       virtuozzolinux-base    VirtuozzoLinux Base                            15,415+189
>       virtuozzolinux-updates VirtuozzoLinux Updates                                  0
>
>       I'm baffled as to how you're on 7.8.0 while I'm at 7.0,15 even though I'm fully up to date.
>
>       # uname -a
>       Linux annie.ufcfan.org 3.10.0-1127.8.2.vz7.151.10 #1 SMP Mon Jun 1 19:05:52 MSK 2020 x86_64 x86_64 x86_64 GNU/Linux
> 
> Jake
> 
> On Thu, Jul 2, 2020 at 10:08 AM Jehan PROCACCIA <jehan.procaccia at imtbs-tsp.eu> wrote:
>       no factory , just repos virtuozzolinux-base and openvz-os
> 
> # yum repolist  |grep virt
> virtuozzolinux-base    VirtuozzoLinux Base                            15 415+189
> virtuozzolinux-updates VirtuozzoLinux Updates                                  0
> 
> Jehan .
> 
> _____________________________________________________________________________________________________________________________________________________
> De: "jjs - mainphrame" <jjs at mainphrame.com>
> À: "OpenVZ users" <users at openvz.org>
> Cc: "Kevin Drysdale" <kevin.drysdale at iomart.com>
> Envoyé: Jeudi 2 Juillet 2020 18:22:33
> Objet: Re: [Users] Issues after updating to 7.0.14 (136)
> 
> Jehan, are you running factory?
> 
> My ovz hosts are up to date, and I see:
> 
> [root at annie ~]# cat /etc/virtuozzo-release
> OpenVZ release 7.0.15 (222)
> 
> Jake
> 
> 
> On Thu, Jul 2, 2020 at 9:08 AM Jehan Procaccia IMT <jehan.procaccia at imtbs-tsp.eu> wrote:
>       "updating to 7.0.14 (136)" !?
> 
> I did an update yesterday , I am far behind that version
> 
> # cat /etc/vzlinux-release
> Virtuozzo Linux release 7.8.0 (609)
> 
> # uname -a
> Linux localhost 3.10.0-1127.8.2.vz7.151.14 #1 SMP Tue Jun 9 12:58:54 MSK 2020 x86_64 x86_64 x86_64 GNU/Linux
> 
> why don't you try to update to latest version ?
> 
> 
> Le 29/06/2020 à 12:30, Kevin Drysdale a écrit :
>       Hello,
>
>       After updating one of our OpenVZ VPS hosting nodes at the end of last week, we've started to have issues with
>       corruption apparently occurring inside containers.  Issues of this nature have never affected the node
>       previously, and there do not appear to be any hardware issues that could explain this.
>
>       Specifically, a few hours after updating, we began to see containers experiencing errors such as this in the
>       logs:
>
>       [90471.678994] EXT4-fs (ploop35454p1): error count since last fsck: 25
>       [90471.679022] EXT4-fs (ploop35454p1): initial error at time 1593205255: ext4_ext_find_extent:904: inode 136399
>       [90471.679030] EXT4-fs (ploop35454p1): last error at time 1593232922: ext4_ext_find_extent:904: inode 136399
>       [95189.954569] EXT4-fs (ploop42983p1): error count since last fsck: 67
>       [95189.954582] EXT4-fs (ploop42983p1): initial error at time 1593210174: htree_dirblock_to_tree:918: inode
>       926441: block 3683060
>       [95189.954589] EXT4-fs (ploop42983p1): last error at time 1593276902: ext4_iget:4435: inode 1849777
>       [95714.207432] EXT4-fs (ploop60706p1): error count since last fsck: 42
>       [95714.207447] EXT4-fs (ploop60706p1): initial error at time 1593210489: ext4_ext_find_extent:904: inode 136272
>       [95714.207452] EXT4-fs (ploop60706p1): last error at time 1593231063: ext4_ext_find_extent:904: inode 136272
>
>       Shutting the containers down and manually mounting and e2fsck'ing their filesystems did clear these errors, but
>       each of the containers (which were mostly used for running Plesk) had widespread issues with corrupt or missing
>       files after the fsck's completed, necessitating their being restored from backup.
>
>       Concurrently, we also began to see messages like this appearing in /var/log/vzctl.log, which again have never
>       appeared at any point prior to this update being installed:
>
>       /var/log/vzctl.log:2020-06-26T21:05:19+0100 : Error in fill_hole (check.c:240): Warning: ploop image
>       '/vz/private/8288448/root.hdd/root.hds' is sparse
>       /var/log/vzctl.log:2020-06-26T21:09:41+0100 : Error in fill_hole (check.c:240): Warning: ploop image
>       '/vz/private/8288450/root.hdd/root.hds' is sparse
>       /var/log/vzctl.log:2020-06-26T21:16:22+0100 : Error in fill_hole (check.c:240): Warning: ploop image
>       '/vz/private/8288451/root.hdd/root.hds' is sparse
>       /var/log/vzctl.log:2020-06-26T21:19:57+0100 : Error in fill_hole (check.c:240): Warning: ploop image
>       '/vz/private/8288452/root.hdd/root.hds' is sparse
>
>       The basic procedure we follow when updating our nodes is as follows:
>
>       1, Update the standby node we keep spare for this process
>       2. vzmigrate all containers from the live node being updated to the standby node
>       3. Update the live node
>       4. Reboot the live node
>       5. vzmigrate the containers from the standby node back to the live node they originally came from
>
>       So the only tool which has been used to affect these containers is 'vzmigrate' itself, so I'm at something of a
>       loss as to how to explain the root.hdd images for these containers containing sparse gaps.  This is something we
>       have never done, as we have always been aware that OpenVZ does not support their use inside a container's hard
>       drive image.  And the fact that these images have suddenly become sparse at the same time they have started to
>       exhibit filesystem corruption is somewhat concerning.
>
>       We can restore all affected containers from backups, but I wanted to get in touch with the list to see if anyone
>       else at any other site has experienced these or similar issues after applying the 7.0.14 (136) update.
>
>       Thank you,
>       Kevin Drysdale.
> 
> 
> 
>
>       _______________________________________________
>       Users mailing list
>       Users at openvz.org
>       https://lists.openvz.org/mailman/listinfo/users
> 
> 
> _______________________________________________
> Users mailing list
> Users at openvz.org
> https://lists.openvz.org/mailman/listinfo/users
> 
> 
> _______________________________________________
> Users mailing list
> Users at openvz.org
> https://lists.openvz.org/mailman/listinfo/users
> _______________________________________________
> Users mailing list
> Users at openvz.org
> https://lists.openvz.org/mailman/listinfo/users
> 
> 
> _______________________________________________
> Users mailing list
> Users at openvz.org
> https://lists.openvz.org/mailman/listinfo/users
> 
> -- 
> Jonathan Wright
> KnownHost, LLC
> https://www.knownhost.com
> _______________________________________________
> Users mailing list
> Users at openvz.org
> https://lists.openvz.org/mailman/listinfo/users
> 
> 
> _______________________________________________
> Users mailing list
> Users at openvz.org
> https://lists.openvz.org/mailman/listinfo/users
> 
> 
>





More information about the Users mailing list