[Users] Persistent failure of live migration

jjs - mainphrame jjs at mainphrame.com
Sat Jun 15 20:21:19 MSK 2019


Greetings -

Live migration, which worked beautifully with openvz 10 years ago,  has
stopped working in the current openvz 7 environment.

When I first built ovz 7 servers a few years ago, live migration worked as
it should. Within the past few months it stopped working. Thinking it might
be a problem with tighter requirements on CPU matching, I repurposed the
amd based ovz host and replaced it, so that both ovz hosts would be running
on intel hardware.

Unfortunately that did not change the issue. Note below -  both hosts
running up to date ovz-7 non-factory.

The command -
[root at annie ~]# vzmigrate --online --nodeps hachi 1987
Connecttion to destination node (hachi) is successfully established
Moving/copying CT 1987 -> CT 1987, [], [] ...
locking 1987
Checking bindmounts
Check cluster ID
Checking keep dir for private area copy
Checking technologies
Checking IP addresses on destination node
Checking RATE parameters in config
Checking ploop format 2
copy CT private /vz/private/1987
Live migration stage started
Compression is enabled
Phaul service failed to live migrate CT
Phaul failed to live migrate CT (/var/log/phaul.log)
Can't move/copy CT 1987 -> CT 1987, [], [] : Phaul failed to live migrate
CT (/var/log/phaul.log)
unlocking 1987
[root at annie ~]#

Contents of phaul.log -
--
10:01:16.569: 17149:
10:01:16.569: 17149:
10:01:16.570: 17149:
10:01:16.570: 17149: Starting p.haul
10:01:16.570: 17149: Use existing connections, fdrpc=11 fdmem=13
fdfs=root.hdd/root.hds:15
10:01:16.589: 17149: Setting up local
10:01:16.589: 17149: Loading config file from /etc/vz/conf/
10:01:16.590: 17149: Initialize ploop hauler
10:01:16.590: 17149: `- /vz/private/1987/root.hdd/root.hds
10:01:16.616: 17149: Passing (ctl:12, data:10) pair to CRIU
10:01:16.616: 17149: Set maximum number of open file descriptors to 1048576
10:01:16.618: 17149: Setting up remote
10:01:16.704: 17149: Start migration in live mode
10:01:16.704: 17149: Checking criu version
10:01:16.757: 17149: Checking for Dirty Tracking
10:01:16.758: 17149: `- Explicitly enabled
10:01:16.758: 17149: Preliminary FS migration
10:01:34.968: 17149: Fs driver transfer 1327497216 bytes (~1266Mb)
10:01:34.968: 17149: * Iteration 0
10:01:35.074: 17149: Making directory
/vz/dump/1987/dmp-VTSUxn-19.06.15-10.01/img/1
10:01:35.075: 17149: Issuing pre-dump command to service
10:01:36.120: 17149: Dumped 28561 pages, 0 skipped
10:01:36.120: 17149: Fs driver transfer 0 bytes
10:01:36.120: 17149: Checking iteration progress:
10:01:36.120: 17149: > Proceed to next iteration
10:01:36.120: 17149: * Iteration 1
10:01:36.122: 17149: Making directory
/vz/dump/1987/dmp-VTSUxn-19.06.15-10.01/img/2
10:01:36.122: 17149: Issuing pre-dump command to service
10:01:37.751: 17149: Dumped 360 pages, 28201 skipped
10:01:37.751: 17149: Fs driver transfer 0 bytes
10:01:37.751: 17149: Checking iteration progress:
10:01:37.751: 17149: > Proceed to next iteration
10:01:37.751: 17149: * Iteration 2
10:01:37.754: 17149: Making directory
/vz/dump/1987/dmp-VTSUxn-19.06.15-10.01/img/3
10:01:37.754: 17149: Issuing pre-dump command to service
10:01:38.485: 17149: Dumped 361 pages, 28200 skipped
10:01:38.485: 17149: Fs driver transfer 0 bytes
10:01:38.485: 17149: Checking iteration progress:
10:01:38.485: 17149: > Too many iterations
10:01:38.485: 17149: Final dump and restore
10:01:38.487: 17149: Making directory
/vz/dump/1987/dmp-VTSUxn-19.06.15-10.01/img/4
10:01:38.545: 17149: Issuing dump command to service
10:01:38.547: 17149: Notify (pre-dump)
10:01:38.555: 17149: Notify (network-lock)
10:01:38.579: 17149: Action script
/usr/libexec/criu/scripts/nfs-ports-allow.sh finished with exit code 0
10:01:38.580: 17149: Notify (post-network-lock)
10:01:41.047: 17149: Final FS and images sync
10:01:41.441: 17149: Sending images to target
10:01:41.442: 17149: Pack
10:01:41.493: 17149: Add htype images
10:01:41.722: 17149: Asking target host to restore
10:01:42.635: 17149: Remote exception
10:01:42.636: 17149: I/O operation on closed file
Traceback (most recent call last):
  File "/usr/libexec/phaul/p.haul", line 9, in <module>
    load_entry_point('phaul==0.1', 'console_scripts', 'p.haul')()
  File "/usr/lib/python2.7/site-packages/phaul/shell/phaul_client.py", line
49, in main
    worker.start_migration()
  File "/usr/lib/python2.7/site-packages/phaul/iters.py", line 161, in
start_migration
    self.__start_live_migration()
  File "/usr/lib/python2.7/site-packages/phaul/iters.py", line 232, in
__start_live_migration
    self.target_host.restore_from_images()
  File "/usr/lib/python2.7/site-packages/phaul/xem_rpc_client.py", line 26,
in __call__
    raise Exception(resp[1])
Exception: I/O operation on closed file
--

Dump directory contents are also available -

Shall I open a bug report?

Jake
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/users/attachments/20190615/c7865c72/attachment.html>


More information about the Users mailing list