[Users] Issues after updating to 7.0.14 (136)

Jehan Procaccia IMT jehan.procaccia at imtbs-tsp.eu
Mon Jul 6 21:40:07 MSK 2020


Hello

If it can help, what I did so far to try to re-enable dead CTs

# prlctl stop ldap2
Stopping the CT...
Failed to stop the CT: PRL_ERR_VZCTL_OPERATION_FAILED (Details: Cannot 
lock the Container
)
# cat /vz/lock/144dc737-b4e3-4c03-852c-25a6df06cee4.lck
6227
resuming
# ps auwx | grep 6227
root        6227  0.0  0.0  92140  6984 ?        S    15:10   0:00 
/usr/sbin/vzctl resume 144dc737-b4e3-4c03-852c-25a6df06cee4
# kill -9  6227

still cannot stop the CT  (Cannot lock the Container...)


# df |grep 144dc737-b4e3-4c03-852c-25a6df06cee4
/dev/ploop11432p1          10188052   2546636    7100848  27% 
/vz/root/144dc737-b4e3-4c03-852c-25a6df06cee4
none                        1048576         0    1048576   0% 
/vz/private/144dc737-b4e3-4c03-852c-25a6df06cee4/dump/Dump/.criu.cgyard.56I2ls
# umount /dev/ploop11432p1

# ploop check -F 
/vz/private/144dc737-b4e3-4c03-852c-25a6df06cee4/root.hdd/root.hds
Reopen rw /vz/private/144dc737-b4e3-4c03-852c-25a6df06cee4/root.hdd/root.hds
Error in ploop_check (check.c:663): Dirty flag is set

# ploop mount 
/vz/private/144dc737-b4e3-4c03-852c-25a6df06cee4/root.hdd/DiskDescriptor.xml
Error in ploop_mount_image (ploop.c:2495): Image 
/vz/private/144dc737-b4e3-4c03-852c-25a6df06cee4/root.hdd/root.hds 
already used by device /dev/ploop11432
# df -H | grep ploop11432
=> nothing

I am lost , any help appreciated  .

Thanks .

Le 06/07/2020 à 15:37, Jehan Procaccia IMT a écrit :
>
> Hello,
>
> I am back to the initial pb related to that post , since I updated to 
> /OpenVZ release 7.0.14 (136)  | ///Virtuozzo Linux release 7.8.0 
> (609)// , I am also facing CT corrupted status .
>
> I don't see the exact same error as mentioned by Kevin Drysdale below 
> (ploop/fsck) , but I am not able to enter certain CT neither can I 
> stop them
>
> /[root at olb~]# prlctl stop trans8//
> //Stopping the CT...//
> //Failed to stop the CT: PRL_ERR_VZCTL_OPERATION_FAILED (Details: 
> Cannot lock the Container//
> //)//
> /
>
> /[root at olb ~]# prlctl enter trans8//
> //Unable to get init pid//
> //enter into CT failed//
> //
> //exited from CT 02faecdd-ddb6-42eb-8103-202508f18256/
>
> For those CTs that fail to enter or stop, I noticed that there is a 
> 2nd device mounted with name ending in /dump/Dump/.criu.cgyard.4EJB8c//
> /
>
> /[root at olb ~]# df -H |grep 02faecdd-ddb6-42eb-8103-202508f18256//
> ///dev/ploop53152p1          11G    2,2G  7,7G  23% 
> /vz/root/02faecdd-ddb6-42eb-8103-202508f18256//
> //none                      537M       0  537M   0% 
> /vz/private/02faecdd-ddb6-42eb-8103-202508f18256/dump/Dump/.criu.cgyard.4EJB8c/
>
>
> //[root at olb ~]# prlctl list | grep 02faecdd-ddb6-42eb-8103-202508f18256//
> //{02faecdd-ddb6-42eb-8103-202508f18256}  running 157.159.196.17  CT 
> isptrans8//
> //
>
> I rebooted the whole hardware node, and since reboot here is the 
> related vzctl.log
>
> /2020-07-06T15:10:38+0200 vzctl : CT 
> 02faecdd-ddb6-42eb-8103-202508f18256 : Removing the stale lock file 
> /vz/lock/02faecdd-ddb6-42eb-8103-202508f18256.lck//
> //2020-07-06T15:10:38+0200 vzctl : CT 
> 02faecdd-ddb6-42eb-8103-202508f18256 : Restoring the Container ...//
> //2020-07-06T15:10:38+0200 vzctl : CT 
> 02faecdd-ddb6-42eb-8103-202508f18256 : Mount image: 
> /vz/private/02faecdd-ddb6-42eb-8103-202508f18256/root.hdd //
> //2020-07-06T15:10:38+0200 : Opening delta 
> /vz/private/02faecdd-ddb6-42eb-8103-202508f18256/root.hdd/root.hds//
> //2020-07-06T15:10:38+0200 : Opening delta 
> /vz/private/02faecdd-ddb6-42eb-8103-202508f18256/root.hdd/root.hds//
> //2020-07-06T15:10:38+0200 : Opening delta 
> /vz/private/02faecdd-ddb6-42eb-8103-202508f18256/root.hdd/root.hds//
> //2020-07-06T15:10:38+0200 : Adding delta dev=/dev/ploop53152 
> img=/vz/private/02faecdd-ddb6-42eb-8103-202508f18256/root.hdd/root.hds 
> (rw)//
> //2020-07-06T15:10:39+0200 : Mounted /dev/ploop53152p1 at 
> /vz/root/02faecdd-ddb6-42eb-8103-202508f18256 fstype=ext4 
> data=',balloon_ino=12' //
> //2020-07-06T15:10:39+0200 vzctl : CT 
> 02faecdd-ddb6-42eb-8103-202508f18256 : Container is mounted//
> //2020-07-06T15:10:40+0200 vzctl : CT 
> 02faecdd-ddb6-42eb-8103-202508f18256 : Setting permissions for 
> image=/vz/private/02faecdd-ddb6-42eb-8103-202508f18256/root.hdd//
> //2020-07-06T15:10:40+0200 vzctl : CT 
> 02faecdd-ddb6-42eb-8103-202508f18256 : Configure memguarantee: 0%//
> //2020-07-06T15:18:12+0200 vzctl : CT 
> 02faecdd-ddb6-42eb-8103-202508f18256 : Unable to get init pid//
> //2020-07-06T15:18:12+0200 vzctl : CT 
> 02faecdd-ddb6-42eb-8103-202508f18256 : enter into CT failed//
> //2020-07-06T15:19:49+0200 vzctl : CT 
> 02faecdd-ddb6-42eb-8103-202508f18256 : Cannot lock the Container//
> //2020-07-06T15:25:33+0200 vzctl : CT 
> 02faecdd-ddb6-42eb-8103-202508f18256 : Unable to get init pid//
> //2020-07-06T15:25:33+0200 vzctl : CT 
> 02faecdd-ddb6-42eb-8103-202508f18256 : enter into CT failed/
>
> on another CT failing to enter / stop same kind of logs  + /Error 
> (criu /:
>
> /2020-07-06T15:10:38+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Restoring the Container ...//
> //2020-07-06T15:10:38+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Mount image: 
> /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd //
> //2020-07-06T15:10:38+0200 : Opening delta 
> /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds//
> //2020-07-06T15:10:39+0200 : Opening delta 
> /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds//
> //2020-07-06T15:10:39+0200 : Opening delta 
> /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds//
> //2020-07-06T15:10:39+0200 : Adding delta dev=/dev/ploop36049 
> img=/vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds 
> (rw)//
> //2020-07-06T15:10:41+0200 : Mounted /dev/ploop36049p1 at 
> /vz/root/4ae48335-5b63-475d-8629-c8d742cb0ba0 fstype=ext4 
> data=',balloon_ino=12' //
> //2020-07-06T15:10:41+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Container is mounted//
> //2020-07-06T15:10:41+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Setting permissions for 
> image=/vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd//
> //2020-07-06T15:10:41+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Configure memguarantee: 0%//
> //2020-07-06T15:10:57+0200 vzeventd : Run: /etc/vz/vzevent.d/ve-stop 
> id=4ae48335-5b63-475d-8629-c8d742cb0ba0//
> //2020-07-06T15:10:57+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : (03.038774) Error 
> (criu/util.c:666): exited, status=4//
> //2020-07-06T15:10:57+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : (14.446513)      1: Error 
> (criu/files.c:230): Empty list on file desc id 0x1f(5)//
> //2020-07-06T15:10:57+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : (14.446518)      1: Error 
> (criu/files.c:231): BUG at criu/files.c:231//
> //2020-07-06T15:10:57+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : (15.589529) Error 
> (criu/cr-restore.c:1612): 7130 killed by signal 11: Segmentation fault//
> //2020-07-06T15:10:57+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : (15.604550) Error 
> (criu/cr-restore.c:2614): Restoring FAILED.//
> //2020-07-06T15:10:57+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : The restore log was saved in 
> /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/dump/Dump/restore.log//
> //2020-07-06T15:10:57+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : criu exited with rc=17//
> //2020-07-06T15:10:57+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Unmount image: 
> /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd (190)//
> //2020-07-06T15:10:57+0200 : Unmounting file system at 
> /vz/root/4ae48335-5b63-475d-8629-c8d742cb0ba0//
> //2020-07-06T15:11:31+0200 : Opening delta 
> /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds//
> //2020-07-06T15:11:31+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Container is unmounted//
> //2020-07-06T15:11:31+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Failed to restore the Container//
> //2020-07-06T15:11:31+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Restoring the Container ...//
> //2020-07-06T15:11:31+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Mount image: 
> /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd //
> //2020-07-06T15:11:31+0200 : Opening delta 
> /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds//
> //2020-07-06T15:11:31+0200 : Opening delta 
> /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds//
> //2020-07-06T15:11:31+0200 : Opening delta 
> /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds//
> //2020-07-06T15:11:31+0200 : Adding delta dev=/dev/ploop36049 
> img=/vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd/root.hds 
> (rw)//
> //2020-07-06T15:11:31+0200 : Mounted /dev/ploop36049p1 at 
> /vz/root/4ae48335-5b63-475d-8629-c8d742cb0ba0 fstype=ext4 
> data=',balloon_ino=12' //
> //2020-07-06T15:11:31+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Container is mounted//
> //2020-07-06T15:11:31+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Setting permissions for 
> image=/vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd//
> //2020-07-06T15:11:31+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Configure memguarantee: 0%//
> //2020-07-06T15:14:18+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : Unable to get init pid//
> //2020-07-06T15:14:18+0200 vzctl : CT 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0 : enter into CT failed//
> /
>
> in prl-disp.log
>
> /07-06 15:10:30.797 F /virtuozzo:4836:4836/ register CT: 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0//
> //07-06 15:10:38.717 F /disp:4836:6163/ Processing command 
> 'DspCmdVmStartEx' 1036 for CT 
> uuid='{4ae48335-5b63-475d-8629-c8d742cb0ba0}' //
> //07-06 15:10:38.738 I /virtuozzo:4836:6234/ /usr/sbin/vzctl resume 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0//
> //07-06 15:10:48.542 I /disp:4836:5196/ vzevent: state=6, 
> envid=4ae48335-5b63-475d-8629-c8d742cb0ba0//
> //07-06 15:10:57.364 I /disp:4836:5196/ vzevent: state=8, 
> envid=4ae48335-5b63-475d-8629-c8d742cb0ba0//
> //07-06 15:10:57.475 I /disp:4836:5196/ vzevent: state=12, 
> envid=4ae48335-5b63-475d-8629-c8d742cb0ba0//
> //07-06 15:11:31.161 F /virtuozzo:4836:6234/ /usr/sbin/vzctl utility 
> failed: /usr/sbin/vzctl resume 4ae48335-5b63-475d-8629-c8d742cb0ba0 [6]//
> //Mount image: 
> /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd //
> //Setting permissions for 
> image=/vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd//
> //Unmount image: 
> /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/root.hdd (190)//
> //The restore log was saved in 
> /vz/private/4ae48335-5b63-475d-8629-c8d742cb0ba0/dump/Dump/restore.log//
> //07-06 15:11:31.162 I /virtuozzo:4836:6234/ /usr/sbin/vzctl start 
> 4ae48335-5b63-475d-8629-c8d742cb0ba0/
>
> Is this related to the update ? how can I renable those CT ?
>
> Thanks .
>
>
> //
>
>>
>>
>> Le 29/06/2020 à 12:30, Kevin Drysdale a écrit :
>>> Hello,
>>>
>>> After updating one of our OpenVZ VPS hosting nodes at the end of 
>>> last week, we've started to have issues with corruption apparently 
>>> occurring inside containers.  Issues of this nature have never 
>>> affected the node previously, and there do not appear to be any 
>>> hardware issues that could explain this.
>>>
>>> Specifically, a few hours after updating, we began to see containers 
>>> experiencing errors such as this in the logs:
>>>
>>> [90471.678994] EXT4-fs (ploop35454p1): error count since last fsck: 25
>>> [90471.679022] EXT4-fs (ploop35454p1): initial error at time 
>>> 1593205255: ext4_ext_find_extent:904: inode 136399
>>> [90471.679030] EXT4-fs (ploop35454p1): last error at time 
>>> 1593232922: ext4_ext_find_extent:904: inode 136399
>>> [95189.954569] EXT4-fs (ploop42983p1): error count since last fsck: 67
>>> [95189.954582] EXT4-fs (ploop42983p1): initial error at time 
>>> 1593210174: htree_dirblock_to_tree:918: inode 926441: block 3683060
>>> [95189.954589] EXT4-fs (ploop42983p1): last error at time 
>>> 1593276902: ext4_iget:4435: inode 1849777
>>> [95714.207432] EXT4-fs (ploop60706p1): error count since last fsck: 42
>>> [95714.207447] EXT4-fs (ploop60706p1): initial error at time 
>>> 1593210489: ext4_ext_find_extent:904: inode 136272
>>> [95714.207452] EXT4-fs (ploop60706p1): last error at time 
>>> 1593231063: ext4_ext_find_extent:904: inode 136272
>>>
>>> Shutting the containers down and manually mounting and e2fsck'ing 
>>> their filesystems did clear these errors, but each of the containers 
>>> (which were mostly used for running Plesk) had widespread issues 
>>> with corrupt or missing files after the fsck's completed, 
>>> necessitating their being restored from backup.
>>>
>>> Concurrently, we also began to see messages like this appearing in 
>>> /var/log/vzctl.log, which again have never appeared at any point 
>>> prior to this update being installed:
>>>
>>> /var/log/vzctl.log:2020-06-26T21:05:19+0100 : Error in fill_hole 
>>> (check.c:240): Warning: ploop image 
>>> '/vz/private/8288448/root.hdd/root.hds' is sparse
>>> /var/log/vzctl.log:2020-06-26T21:09:41+0100 : Error in fill_hole 
>>> (check.c:240): Warning: ploop image 
>>> '/vz/private/8288450/root.hdd/root.hds' is sparse
>>> /var/log/vzctl.log:2020-06-26T21:16:22+0100 : Error in fill_hole 
>>> (check.c:240): Warning: ploop image 
>>> '/vz/private/8288451/root.hdd/root.hds' is sparse
>>> /var/log/vzctl.log:2020-06-26T21:19:57+0100 : Error in fill_hole 
>>> (check.c:240): Warning: ploop image 
>>> '/vz/private/8288452/root.hdd/root.hds' is sparse
>>>
>>> The basic procedure we follow when updating our nodes is as follows:
>>>
>>> 1, Update the standby node we keep spare for this process
>>> 2. vzmigrate all containers from the live node being updated to the 
>>> standby node
>>> 3. Update the live node
>>> 4. Reboot the live node
>>> 5. vzmigrate the containers from the standby node back to the live 
>>> node they originally came from
>>>
>>> So the only tool which has been used to affect these containers is 
>>> 'vzmigrate' itself, so I'm at something of a loss as to how to 
>>> explain the root.hdd images for these containers containing sparse 
>>> gaps.  This is something we have never done, as we have always been 
>>> aware that OpenVZ does not support their use inside a container's 
>>> hard drive image.  And the fact that these images have suddenly 
>>> become sparse at the same time they have started to exhibit 
>>> filesystem corruption is somewhat concerning.
>>>
>>> We can restore all affected containers from backups, but I wanted to 
>>> get in touch with the list to see if anyone else at any other site 
>>> has experienced these or similar issues after applying the 7.0.14 
>>> (136) update.
>>>
>>> Thank you,
>>> Kevin Drysdale.
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at openvz.org
>>> https://lists.openvz.org/mailman/listinfo/users
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at openvz.org
>> https://lists.openvz.org/mailman/listinfo/users
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/users/attachments/20200706/8baa15ed/attachment-0001.html>


More information about the Users mailing list