[Users] Persistent failure of live migration

Konstantin Khorenko khorenko at virtuozzo.com
Sun Jun 16 14:34:44 MSK 2019


On 06/15/2019 08:21 PM, jjs - mainphrame wrote:
> Greetings -
>
> Live migration, which worked beautifully with openvz 10 years ago,  has stopped working in the current openvz 7 environment.
>
> When I first built ovz 7 servers a few years ago, live migration worked as it should. Within the past few months it stopped working. Thinking it might be a problem with tighter
> requirements on CPU matching, I repurposed the amd based ovz host and replaced it, so that both ovz hosts would be running on intel hardware.
>
> Unfortunately that did not change the issue. Note below -  both hosts running up to date ovz-7 non-factory.
>
> The command -
> [root at annie ~]# vzmigrate --online --nodeps hachi 1987
> Connecttion to destination node (hachi) is successfully established
> Moving/copying CT 1987 -> CT 1987, [], [] ...
> locking 1987
> Checking bindmounts
> Check cluster ID
> Checking keep dir for private area copy
> Checking technologies
> Checking IP addresses on destination node
> Checking RATE parameters in config
> Checking ploop format 2
> copy CT private /vz/private/1987
> Live migration stage started
> Compression is enabled
> Phaul service failed to live migrate CT
> Phaul failed to live migrate CT (/var/log/phaul.log)
> Can't move/copy CT 1987 -> CT 1987, [], [] : Phaul failed to live migrate CT (/var/log/phaul.log)
> unlocking 1987
> [root at annie ~]#
>
> Contents of phaul.log -
> --
> 10:01:16.569: 17149:
> 10:01:16.569: 17149:
> 10:01:16.570: 17149:
> 10:01:16.570: 17149: Starting p.haul
> 10:01:16.570: 17149: Use existing connections, fdrpc=11 fdmem=13 fdfs=root.hdd/root.hds:15
> 10:01:16.589: 17149: Setting up local
> 10:01:16.589: 17149: Loading config file from /etc/vz/conf/
> 10:01:16.590: 17149: Initialize ploop hauler
> 10:01:16.590: 17149: `- /vz/private/1987/root.hdd/root.hds
> 10:01:16.616: 17149: Passing (ctl:12, data:10) pair to CRIU
> 10:01:16.616: 17149: Set maximum number of open file descriptors to 1048576
> 10:01:16.618: 17149: Setting up remote
> 10:01:16.704: 17149: Start migration in live mode
> 10:01:16.704: 17149: Checking criu version
> 10:01:16.757: 17149: Checking for Dirty Tracking
> 10:01:16.758: 17149: `- Explicitly enabled
> 10:01:16.758: 17149: Preliminary FS migration
> 10:01:34.968: 17149: Fs driver transfer 1327497216 bytes (~1266Mb)
> 10:01:34.968: 17149: * Iteration 0
> 10:01:35.074: 17149: Making directory /vz/dump/1987/dmp-VTSUxn-19.06.15-10.01/img/1
> 10:01:35.075: 17149: Issuing pre-dump command to service
> 10:01:36.120: 17149: Dumped 28561 pages, 0 skipped
> 10:01:36.120: 17149: Fs driver transfer 0 bytes
> 10:01:36.120: 17149: Checking iteration progress:
> 10:01:36.120: 17149: > Proceed to next iteration
> 10:01:36.120: 17149: * Iteration 1
> 10:01:36.122: 17149: Making directory /vz/dump/1987/dmp-VTSUxn-19.06.15-10.01/img/2
> 10:01:36.122: 17149: Issuing pre-dump command to service
> 10:01:37.751: 17149: Dumped 360 pages, 28201 skipped
> 10:01:37.751: 17149: Fs driver transfer 0 bytes
> 10:01:37.751: 17149: Checking iteration progress:
> 10:01:37.751: 17149: > Proceed to next iteration
> 10:01:37.751: 17149: * Iteration 2
> 10:01:37.754: 17149: Making directory /vz/dump/1987/dmp-VTSUxn-19.06.15-10.01/img/3
> 10:01:37.754: 17149: Issuing pre-dump command to service
> 10:01:38.485: 17149: Dumped 361 pages, 28200 skipped
> 10:01:38.485: 17149: Fs driver transfer 0 bytes
> 10:01:38.485: 17149: Checking iteration progress:
> 10:01:38.485: 17149: > Too many iterations
> 10:01:38.485: 17149: Final dump and restore
> 10:01:38.487: 17149: Making directory /vz/dump/1987/dmp-VTSUxn-19.06.15-10.01/img/4
> 10:01:38.545: 17149: Issuing dump command to service
> 10:01:38.547: 17149: Notify (pre-dump)
> 10:01:38.555: 17149: Notify (network-lock)
> 10:01:38.579: 17149: Action script /usr/libexec/criu/scripts/nfs-ports-allow.sh finished with exit code 0
> 10:01:38.580: 17149: Notify (post-network-lock)
> 10:01:41.047: 17149: Final FS and images sync
> 10:01:41.441: 17149: Sending images to target
> 10:01:41.442: 17149: Pack
> 10:01:41.493: 17149: Add htype images
> 10:01:41.722: 17149: Asking target host to restore
> 10:01:42.635: 17149: Remote exception
> 10:01:42.636: 17149: I/O operation on closed file
> Traceback (most recent call last):
>   File "/usr/libexec/phaul/p.haul", line 9, in <module>
>     load_entry_point('phaul==0.1', 'console_scripts', 'p.haul')()
>   File "/usr/lib/python2.7/site-packages/phaul/shell/phaul_client.py", line 49, in main
>     worker.start_migration()
>   File "/usr/lib/python2.7/site-packages/phaul/iters.py", line 161, in start_migration
>     self.__start_live_migration()
>   File "/usr/lib/python2.7/site-packages/phaul/iters.py", line 232, in __start_live_migration
>     self.target_host.restore_from_images()
>   File "/usr/lib/python2.7/site-packages/phaul/xem_rpc_client.py", line 26, in __call__
>     raise Exception(resp[1])
> Exception: I/O operation on closed file
> --
>
> Dump directory contents are also available -
>
> Shall I open a bug report?

Yes, file a bug please.

--
Best regards,

Konstantin Khorenko,
Virtuozzo Linux Kernel Team



More information about the Users mailing list