[CRIU] [PATCH 12/12] p.haul: implement vz migration with shared ploops

Tue Apr 5 05:00:42 PDT 2016

On Mon, 2016-04-04 at 17:27 +0300, Alexander Burluka wrote:
> Final commit that introduces vz migration with ploops
> that are located on shared disks.
> Brief algorithm description:
> After freezing container source side creates copy of
> every shared ploop DiskDescriptor.xml (named DiskDescriptor.xml.copy)
> and makes snapshots delta on origin and copy.
> Origin snapshot would be used on destination after successful
> "vzctl restore" operation (this snapshot delta is created
> on source and merged on destination so it is create as "offline
> snapshot")
> On restore fail case DiskDescriptor.xml.copy would replace origin file
> and merged with copy snapshot delta.
> 
> Signed-off-by: Alexander Burluka <aburluka at virtuozzo.com>
> ---
>  phaul/iters.py | 39 ++++++++++++++++++++++++---------------
>  1 file changed, 24 insertions(+), 15 deletions(-)
> 
> diff --git a/phaul/iters.py b/phaul/iters.py
> index 9a2f325..200c830 100644
> --- a/phaul/iters.py
> +++ b/phaul/iters.py
> @@ -204,27 +204,36 @@ class phaul_iter_worker:
>  		self.htype.final_dump(root_pid, self.img, self.criu_connection, self.fs)
>  		self.target_host.end_iter()
>  
> -		# Handle final FS and images sync on frozen htype
> -		logging.info("Final FS and images sync")
> -		fsstats = self.fs.stop_migration()
> -		self.img.sync_imgs_to_target(self.target_host, self.htype,
> -			self.connection.mem_sk)
> -
> -		# Restore htype on target
> -		logging.info("Asking target host to restore")
> -		self.target_host.restore_from_images()
> -		logging.info("Restored on target host")
> -
> -		# Ack previous dump request to terminate all frozen tasks
> -		resp = self.criu_connection.ack_notify()
> -		if not resp.success:
> -			raise Exception("Dump screwed up")
> +		try:
> +			# Handle final FS and images sync on frozen htype
> +			logging.info("Final FS and images sync")
> +			fsstats = self.fs.stop_migration()
> +
> +			self.img.sync_imgs_to_target(self.target_host, self.htype,
> +				self.connection.mem_sk)
> +
> +			# Restore htype on target
> +			logging.info("Asking target host to restore")
> +			self.target_host.restore_from_images()
> +			logging.info("Restored on target host")
> +
> +			# Ack previous dump request to terminate all frozen tasks
> +			resp = self.criu_connection.ack_notify()
> +			if not resp.success:
> +				raise Exception("Dump screwed up")

As far as I can see it is incorrect to raise this exception. After
self.target_host.restore_from_images return success we cant fail since
CT already running at destination. It results in two running instances
of same CT on source and target. This bug exist in old versions of phaul
as well.

Can you please fix it since you rework rollbacks? I think we can replace
raise with simple error message for now. Xemul, what do you think, is
such fix correct?

> +		except:
> +			self.fs.restore_shared_backups()

Please add same call (restore_shared_backups) to
__start_restart_migration (needed for uniformity).

> +			raise
> +
> +		# cleanup shared disks backup
> +		self.fs.cleanup_shared_backups()

Please add same call (cleanup_shared_backups) to
__start_restart_migration (needed for uniformity).

>  
>  		dstats = criu_api.criu_get_dstats(self.img)
>  		migration_stats.handle_iteration(dstats, fsstats)
>  
>  		logging.info("Migration succeeded")
>  		self.htype.umount()
> +		self.target_host.final_cleanup(self.fs.prepare_src_data({}))
>  		migration_stats.handle_stop(self)
>  		self.img.close()
>  		self.criu_connection.close()