[CRIU] [PATCH v3 12/12] p.haul: implement vz migration with shared ploops

Alexander Burluka aburluka at virtuozzo.com
Tue Apr 12 01:36:42 PDT 2016


On Mon, Apr 11, 2016 at 09:38:47PM +0300, Pavel Emelyanov wrote:
> On 04/11/2016 07:29 PM, Alexander Burluka wrote:
> > Final commit that introduces vz migration with ploops
> > that are located on shared disks.
> > Brief algorithm description:
> > After freezing container source side creates copy of
> > every shared ploop DiskDescriptor.xml (named DiskDescriptor.xml.copy)
> > and makes snapshots delta on origin and copy.
> > Origin snapshot would be used on destination after successful
> > "vzctl restore" operation (this snapshot delta is created
> > on source and merged on destination so it is create as "offline
> > snapshot")
> > On restore fail case DiskDescriptor.xml.copy would replace origin file
> > and merged with copy snapshot delta.
> > 
> > Signed-off-by: Alexander Burluka <aburluka at virtuozzo.com>
> > ---
> >  phaul/iters.py | 32 ++++++++++++++++++++++----------
> >  1 file changed, 22 insertions(+), 10 deletions(-)
> > 
> > diff --git a/phaul/iters.py b/phaul/iters.py
> > index cbac565..453360e 100644
> > --- a/phaul/iters.py
> > +++ b/phaul/iters.py
> > @@ -202,27 +202,36 @@ class phaul_iter_worker:
> >  		self.htype.final_dump(root_pid, self.img, self.criu_connection, self.fs)
> >  		self.target_host.end_iter()
> >  
> > -		# Handle final FS and images sync on frozen htype
> > -		logging.info("Final FS and images sync")
> > -		fsstats = self.fs.stop_migration()
> > -		self.img.sync_imgs_to_target(self.target_host, self.htype,
> > -			self.connection.mem_sk)
> > +		try:
> > +			# Handle final FS and images sync on frozen htype
> > +			logging.info("Final FS and images sync")
> > +			fsstats = self.fs.stop_migration()
> > +
> > +			self.img.sync_imgs_to_target(self.target_host, self.htype,
> > +				self.connection.mem_sk)
> >  
> > -		# Restore htype on target
> > -		logging.info("Asking target host to restore")
> > -		self.target_host.restore_from_images()
> > -		logging.info("Restored on target host")
> > +			# Restore htype on target
> > +			logging.info("Asking target host to restore")
> > +			self.target_host.restore_from_images()
> > +			logging.info("Restored on target host")
> > +		except:
> 
> Which exception do we expect here?
> 

Any kind of exception. We catch it, restore original ploop state and
reraise it. That is original phaul behavior but with additional step

> > +			self.fs.restore_shared_backups()
> 
> Why generic callback is called '_backups'?
> 

It's just a draft name. Is it really confusing?
My thoughts were: ploop original state is snapshoted so you may say that
it is saved or backuped. "snapshot" will be way too specific as for me.

> > +			raise
> >  
> >  		# Ack previous dump request to terminate all frozen tasks
> >  		resp = self.criu_connection.ack_notify()
> >  		if not resp.success:
> > -			raise Exception("Dump screwed up")
> > +			logging.warning("Bad notification from target host")
> > +
> > +		# cleanup shared disks backup
> > +		self.fs.cleanup_shared_backups()
> 
> Up above there was self.fs.stop_migration(). Why isn't this callback enough?
> 

This operation makes source restore impossible. So it should be
somewhere after restore_shared_backups method. I made this cleanup as
soon as we got notification about success migration. It can be easily
done later.

> >  
> >  		dstats = criu_api.criu_get_dstats(self.img)
> >  		migration_stats.handle_iteration(dstats, fsstats)
> >  
> >  		logging.info("Migration succeeded")
> >  		self.htype.umount()
> > +		self.target_host.final_cleanup(self.fs.prepare_src_data({}))
> 
> This is probably comment for patch #10, but I'll ask one here -- why do we
> need to carry more args to final_cleanup()? All info needed for target_host
> should already be there.
> 

Ploop snapshot guids are not stored on destination. I did not found more
natural way to deliver it there. Does it look weird?

> >  		migration_stats.handle_stop(self)
> >  		self.img.close()
> >  		self.criu_connection.close()
> > @@ -276,11 +285,14 @@ class phaul_iter_worker:
> >  			logging.info("Started on target host")
> >  
> >  		except:
> > +			self.fs.restore_shared_backups()
> >  			self.htype.start()
> >  			raise
> >  
> >  		logging.info("Migration succeeded")
> > +		self.fs.cleanup_shared_backups()
> >  		self.htype.umount()
> 
> Can we do fs.cleanup_shared_backups() in htype.umount()?
> 

We can but thereafter "umount" name would be wrong. What do you think
about "release" name?

> > +		self.target_host.final_cleanup(self.fs.prepare_src_data({}))
> >  		migration_stats.handle_stop()
> >  
> >  	def __check_live_iter_progress(self, index, dstats, prev_dstats):
> > 
> 

-- 
Regards,
Alexander Burluka


More information about the CRIU mailing list