[CRIU] [PATCH v3 12/12] p.haul: implement vz migration with shared ploops

Tue Apr 12 01:52:10 PDT 2016

On 04/12/2016 11:36 AM, Alexander Burluka wrote:
> On Mon, Apr 11, 2016 at 09:38:47PM +0300, Pavel Emelyanov wrote:
>> On 04/11/2016 07:29 PM, Alexander Burluka wrote:
>>> Final commit that introduces vz migration with ploops
>>> that are located on shared disks.
>>> Brief algorithm description:
>>> After freezing container source side creates copy of
>>> every shared ploop DiskDescriptor.xml (named DiskDescriptor.xml.copy)
>>> and makes snapshots delta on origin and copy.
>>> Origin snapshot would be used on destination after successful
>>> "vzctl restore" operation (this snapshot delta is created
>>> on source and merged on destination so it is create as "offline
>>> snapshot")
>>> On restore fail case DiskDescriptor.xml.copy would replace origin file
>>> and merged with copy snapshot delta.
>>>
>>> Signed-off-by: Alexander Burluka <aburluka at virtuozzo.com>
>>> ---
>>>  phaul/iters.py | 32 ++++++++++++++++++++++----------
>>>  1 file changed, 22 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/phaul/iters.py b/phaul/iters.py
>>> index cbac565..453360e 100644
>>> --- a/phaul/iters.py
>>> +++ b/phaul/iters.py
>>> @@ -202,27 +202,36 @@ class phaul_iter_worker:
>>>  		self.htype.final_dump(root_pid, self.img, self.criu_connection, self.fs)
>>>  		self.target_host.end_iter()
>>>  
>>> -		# Handle final FS and images sync on frozen htype
>>> -		logging.info("Final FS and images sync")
>>> -		fsstats = self.fs.stop_migration()
>>> -		self.img.sync_imgs_to_target(self.target_host, self.htype,
>>> -			self.connection.mem_sk)
>>> +		try:
>>> +			# Handle final FS and images sync on frozen htype
>>> +			logging.info("Final FS and images sync")
>>> +			fsstats = self.fs.stop_migration()
>>> +
>>> +			self.img.sync_imgs_to_target(self.target_host, self.htype,
>>> +				self.connection.mem_sk)
>>>  
>>> -		# Restore htype on target
>>> -		logging.info("Asking target host to restore")
>>> -		self.target_host.restore_from_images()
>>> -		logging.info("Restored on target host")
>>> +			# Restore htype on target
>>> +			logging.info("Asking target host to restore")
>>> +			self.target_host.restore_from_images()
>>> +			logging.info("Restored on target host")
>>> +		except:
>>
>> Which exception do we expect here?
>>
> 
> Any kind of exception. We catch it, restore original ploop state and
> reraise it. That is original phaul behavior but with additional step

So it's a bugfix that deserves separate patch with description, isn't it?

>>> +			self.fs.restore_shared_backups()
>>
>> Why generic callback is called '_backups'?
>>
> 
> It's just a draft name. Is it really confusing?

It is.

> My thoughts were: ploop original state is snapshoted so you may say that
> it is saved or backuped. "snapshot" will be way too specific as for me.

Since we're touching the generic code, let's better introduce a set of needed
htype callbacks that are put in desired places and not tie them with whether
they are about disk images or not.

>>> +			raise
>>>  
>>>  		# Ack previous dump request to terminate all frozen tasks
>>>  		resp = self.criu_connection.ack_notify()
>>>  		if not resp.success:
>>> -			raise Exception("Dump screwed up")
>>> +			logging.warning("Bad notification from target host")
>>> +
>>> +		# cleanup shared disks backup
>>> +		self.fs.cleanup_shared_backups()
>>
>> Up above there was self.fs.stop_migration(). Why isn't this callback enough?
>>
> 
> This operation makes source restore impossible. So it should be
> somewhere after restore_shared_backups method. I made this cleanup as
> soon as we got notification about success migration. It can be easily
> done later.

OK, so we need to notify fs driver about last iteration finish (stop_migration)
_and_ about the whole migration complete. Right?

>>>  
>>>  		dstats = criu_api.criu_get_dstats(self.img)
>>>  		migration_stats.handle_iteration(dstats, fsstats)
>>>  
>>>  		logging.info("Migration succeeded")
>>>  		self.htype.umount()
>>> +		self.target_host.final_cleanup(self.fs.prepare_src_data({}))
>>
>> This is probably comment for patch #10, but I'll ask one here -- why do we
>> need to carry more args to final_cleanup()? All info needed for target_host
>> should already be there.
>>
> 
> Ploop snapshot guids are not stored on destination. I did not found more
> natural way to deliver it there. Does it look weird?

It is :(

>>>  		migration_stats.handle_stop(self)
>>>  		self.img.close()
>>>  		self.criu_connection.close()
>>> @@ -276,11 +285,14 @@ class phaul_iter_worker:
>>>  			logging.info("Started on target host")
>>>  
>>>  		except:
>>> +			self.fs.restore_shared_backups()
>>>  			self.htype.start()
>>>  			raise
>>>  
>>>  		logging.info("Migration succeeded")
>>> +		self.fs.cleanup_shared_backups()
>>>  		self.htype.umount()
>>
>> Can we do fs.cleanup_shared_backups() in htype.umount()?
>>
> 
> We can but thereafter "umount" name would be wrong. What do you think
> about "release" name?

htype.cleanup() or htype.migration_complete() sound OK.

>>> +		self.target_host.final_cleanup(self.fs.prepare_src_data({}))
>>>  		migration_stats.handle_stop()
>>>  
>>>  	def __check_live_iter_progress(self, index, dstats, prev_dstats):
>>>
>>
>