[Users] Unable to perform migration between two fairly new OpenVZ 7 nodes

Pavel Vokhmyanin pvokhmyanin at virtuozzo.com
Mon Mar 22 14:41:44 MSK 2021


Unfortunately it’s a separate problem with lack of user namespace support in CRIU at the moment. It wouldn’t prevent phaul-client from starting on source.
It’s a bit tricky to investigate without remote access.

Since you’re saying it’s a clean fresh install, I might be able to replicate the issue in my sandbox. I’ll setup two VMs with openvz 7u15 and see how migration works for me.

Pavel Vokhmyanin
Software Developer, Virtuozzo R&D

Otradnaya street 2b/9, “Otradnoe” Techno Park | Moscow | Russia
Phone: +7 (495) 139 80 17, ext 77449  | pvokhmyanin at virtuozzo.com<mailto:pvokhmyanin at virtuozzo.com>
Skype: pvokhmyanin

Virtuozzo.com<https://virtuozzo.com/>

From: users-bounces at openvz.org [mailto:users-bounces at openvz.org] On Behalf Of Joe Dougherty
Sent: Monday, March 22, 2021 1:48 PM
To: OpenVZ users
Subject: Re: [Users] Unable to perform migration between two fairly new OpenVZ 7 nodes

It looks like the issue might be with the destination server, CRIU works fine on the source server but I get this error when trying to suspend and restore on the destination server:

(00.134699) Error (criu/namespaces.c:481): Can't dump nested user namespace for 5051
(00.134708) Error (criu/namespaces.c:836): Can't make userns id
(00.140699) Error (criu/util.c:684): exited, status=1
(00.142811) Error (criu/util.c:684): exited, status=1
(00.144311) Error (criu/cr-dump.c:2275): Dumping FAILED.
Failed to checkpoint the Container
All dump files and logs were saved to /vz/private/10597/dump/Dump.fail
Checkpointing failed
Container is already running

Also confirmed that this command works on both servers: "python /usr/lib/python2.7/site-packages/phaul/shell/phaul_client.py --help"

On Mon, Mar 22, 2021 at 6:40 AM Pavel Vokhmyanin <pvokhmyanin at virtuozzo.com<mailto:pvokhmyanin at virtuozzo.com>> wrote:
I see, versions are correct, no problems there.

It might be a bit awkward to debug this over the mail further, but lets try couple basic tests.

Can you tell me, does “vzctl suspend %CTID%; vzctl resume %CTID%” succeed? This should test whether CRIU has all dependencies and is operational.
Besides, is there an exception if you run “python /usr/lib/python2.7/site-packages/phaul/shell/phaul_client.py --help” ?

Right now we know that phaul-client doesn’t start on your source server, but can’t tell exactly what the problem is. Hopefully these tests will give us more clues.

Pavel Vokhmyanin
Software Developer, Virtuozzo R&D

Otradnaya street 2b/9, “Otradnoe” Techno Park | Moscow | Russia
Phone: +7 (495) 139 80 17, ext 77449  | pvokhmyanin at virtuozzo.com<mailto:pvokhmyanin at virtuozzo.com>
Skype: pvokhmyanin

Virtuozzo.com<https://virtuozzo.com/>

From: users-bounces at openvz.org<mailto:users-bounces at openvz.org> [mailto:users-bounces at openvz.org<mailto:users-bounces at openvz.org>] On Behalf Of Joe Dougherty
Sent: Monday, March 22, 2021 12:56 PM
To: OpenVZ users
Subject: Re: [Users] Unable to perform migration between two fairly new OpenVZ 7 nodes

Source server:

# rpm -q vzmigrate phaul
vzmigrate-7.0.138-1.vz7.x86_64
phaul-0.1.76-1.vz7.noarch

Destination server:

# rpm -q vzmigrate phaul
vzmigrate-7.0.138-1.vz7.x86_64
phaul-0.1.76-1.vz7.noarch

On Mon, Mar 22, 2021 at 5:50 AM Pavel Vokhmyanin <pvokhmyanin at virtuozzo.com<mailto:pvokhmyanin at virtuozzo.com>> wrote:
Hello Joe,

These symptoms indicate that phaul failed to start.
Recently there were changes in arguments for phaul. If you have an old vzmigrate and new phaul or vice-versa, you could get this behavior.
You should have either  phaul <=0.1.78 + vzmigrate <=7.0.140 OR phaul 0.1.79 + vzmgirate >=7.0.142.

Can you elaborate what package versions you’re using on the source server?
# rpm –q vzmigrate phaul

Pavel Vokhmyanin
Software Developer, Virtuozzo R&D

Otradnaya street 2b/9, “Otradnoe” Techno Park | Moscow | Russia
Phone: +7 (495) 139 80 17, ext 77449  | pvokhmyanin at virtuozzo.com<mailto:pvokhmyanin at virtuozzo.com>
Skype: pvokhmyanin

Virtuozzo.com<https://virtuozzo.com/>

From: users-bounces at openvz.org<mailto:users-bounces at openvz.org> [mailto:users-bounces at openvz.org<mailto:users-bounces at openvz.org>] On Behalf Of Joe Dougherty
Sent: Monday, March 22, 2021 2:26 AM
To: OpenVZ users
Subject: [Users] Unable to perform migration between two fairly new OpenVZ 7 nodes

I'm attempting to perform migrations between two nodes, one build about a month ago and the other build this morning. I can migrate the containers if I stop them first, but attempts to migrate them while powered on (both using --live and without) fail due to phaul-service.

The command I'm running:

vzmigrate -vvv --keep-dst --ssh="-p 2200 -i /root/.ssh/key" server2 10597

Here's the tail end of the output where it fails:

2021-03-21 19:13:08.979: Warm migration stage started
2021-03-21 19:13:08.979: Compression is enabled
2021-03-21 19:13:09.063: Io multiplexer peer aborted
2021-03-21 19:13:09.063: 2021-03-21 16:13:10.778: Phaul service failed to live migrate CT
2021-03-21 19:13:09.064: 2021-03-21 16:13:10.778: cmd 'runphaulmigr' error [-73] : Phaul service failed to live migrate CT
2021-03-21 19:13:09.064: Phaul service failed to live migrate CT
2021-03-21 19:13:09.064: Phaul failed to live migrate CT (/var/log/phaul.log)
2021-03-21 19:13:09.064: 2021-03-21 16:13:10.779: cleaning : rename : /vz/private/10597 -> /vz/private/10597.migrated
2021-03-21 19:13:09.064: 2021-03-21 16:13:10.779: cleaning : destroy CT 10597
2021-03-21 19:13:09.073: 2021-03-21 16:13:10.788: cleaning : 'rmdir' dir : /vz/root/10597
2021-03-21 19:13:09.074: 2021-03-21 16:13:10.788: can not find entry for delete : [/vz/root/10597]
2021-03-21 19:13:09.074: 2021-03-21 16:13:10.788: cleaning : rename : /vz/private/10597 -> /vz/private/10597.migrated
2021-03-21 19:13:09.074: 2021-03-21 16:13:10.788: can not move '/vz/private/10597' -> '/vz/private/10597.migrated' : No such file or directory
2021-03-21 19:13:09.074: 2021-03-21 16:13:10.788: Can't do correct cleaning: can not move '/vz/private/10597' -> '/vz/private/10597.migrated' : No such file or directory
2021-03-21 19:13:09.074: 2021-03-21 16:13:10.788: unlocking 10597
2021-03-21 19:13:09.075: Can't move/copy CT 10597 -> CT 10597, [], [] : Phaul failed to live migrate CT (/var/log/phaul.log)
2021-03-21 19:13:09.075: cleaning : 'rm' file : /vz/dump/10597-criu_err.log
2021-03-21 19:13:09.075: can not find entry for delete : [/vz/dump/10597-criu_err.log]
2021-03-21 19:13:10.075: unlocking 10597
2021-03-21 19:13:10.075: close channel

There is no log file.

# cat /var/log/phaul.log
cat: /var/log/phaul.log: No such file or directory

Any ideas on how I can fix this so I can begin performing migrations without having to power off the containers first?

Thank you.

-Joe

_______________________________________________
Users mailing list
Users at openvz.org<mailto:Users at openvz.org>
https://lists.openvz.org/mailman/listinfo/users


--
-Joe Dougherty
Chief Operating Officer
Secure Dragon LLC
www.SecureDragon.net<http://www.SecureDragon.net>
_______________________________________________
Users mailing list
Users at openvz.org<mailto:Users at openvz.org>
https://lists.openvz.org/mailman/listinfo/users


--
-Joe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/users/attachments/20210322/4d4320a3/attachment-0001.html>


More information about the Users mailing list