[CRIU] [PATCH v3 12/12] p.haul: implement vz migration with shared ploops
Pavel Emelyanov
xemul at virtuozzo.com
Tue Apr 12 01:52:10 PDT 2016
On 04/12/2016 11:36 AM, Alexander Burluka wrote:
> On Mon, Apr 11, 2016 at 09:38:47PM +0300, Pavel Emelyanov wrote:
>> On 04/11/2016 07:29 PM, Alexander Burluka wrote:
>>> Final commit that introduces vz migration with ploops
>>> that are located on shared disks.
>>> Brief algorithm description:
>>> After freezing container source side creates copy of
>>> every shared ploop DiskDescriptor.xml (named DiskDescriptor.xml.copy)
>>> and makes snapshots delta on origin and copy.
>>> Origin snapshot would be used on destination after successful
>>> "vzctl restore" operation (this snapshot delta is created
>>> on source and merged on destination so it is create as "offline
>>> snapshot")
>>> On restore fail case DiskDescriptor.xml.copy would replace origin file
>>> and merged with copy snapshot delta.
>>>
>>> Signed-off-by: Alexander Burluka <aburluka at virtuozzo.com>
>>> ---
>>> phaul/iters.py | 32 ++++++++++++++++++++++----------
>>> 1 file changed, 22 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/phaul/iters.py b/phaul/iters.py
>>> index cbac565..453360e 100644
>>> --- a/phaul/iters.py
>>> +++ b/phaul/iters.py
>>> @@ -202,27 +202,36 @@ class phaul_iter_worker:
>>> self.htype.final_dump(root_pid, self.img, self.criu_connection, self.fs)
>>> self.target_host.end_iter()
>>>
>>> - # Handle final FS and images sync on frozen htype
>>> - logging.info("Final FS and images sync")
>>> - fsstats = self.fs.stop_migration()
>>> - self.img.sync_imgs_to_target(self.target_host, self.htype,
>>> - self.connection.mem_sk)
>>> + try:
>>> + # Handle final FS and images sync on frozen htype
>>> + logging.info("Final FS and images sync")
>>> + fsstats = self.fs.stop_migration()
>>> +
>>> + self.img.sync_imgs_to_target(self.target_host, self.htype,
>>> + self.connection.mem_sk)
>>>
>>> - # Restore htype on target
>>> - logging.info("Asking target host to restore")
>>> - self.target_host.restore_from_images()
>>> - logging.info("Restored on target host")
>>> + # Restore htype on target
>>> + logging.info("Asking target host to restore")
>>> + self.target_host.restore_from_images()
>>> + logging.info("Restored on target host")
>>> + except:
>>
>> Which exception do we expect here?
>>
>
> Any kind of exception. We catch it, restore original ploop state and
> reraise it. That is original phaul behavior but with additional step
So it's a bugfix that deserves separate patch with description, isn't it?
>>> + self.fs.restore_shared_backups()
>>
>> Why generic callback is called '_backups'?
>>
>
> It's just a draft name. Is it really confusing?
It is.
> My thoughts were: ploop original state is snapshoted so you may say that
> it is saved or backuped. "snapshot" will be way too specific as for me.
Since we're touching the generic code, let's better introduce a set of needed
htype callbacks that are put in desired places and not tie them with whether
they are about disk images or not.
>>> + raise
>>>
>>> # Ack previous dump request to terminate all frozen tasks
>>> resp = self.criu_connection.ack_notify()
>>> if not resp.success:
>>> - raise Exception("Dump screwed up")
>>> + logging.warning("Bad notification from target host")
>>> +
>>> + # cleanup shared disks backup
>>> + self.fs.cleanup_shared_backups()
>>
>> Up above there was self.fs.stop_migration(). Why isn't this callback enough?
>>
>
> This operation makes source restore impossible. So it should be
> somewhere after restore_shared_backups method. I made this cleanup as
> soon as we got notification about success migration. It can be easily
> done later.
OK, so we need to notify fs driver about last iteration finish (stop_migration)
_and_ about the whole migration complete. Right?
>>>
>>> dstats = criu_api.criu_get_dstats(self.img)
>>> migration_stats.handle_iteration(dstats, fsstats)
>>>
>>> logging.info("Migration succeeded")
>>> self.htype.umount()
>>> + self.target_host.final_cleanup(self.fs.prepare_src_data({}))
>>
>> This is probably comment for patch #10, but I'll ask one here -- why do we
>> need to carry more args to final_cleanup()? All info needed for target_host
>> should already be there.
>>
>
> Ploop snapshot guids are not stored on destination. I did not found more
> natural way to deliver it there. Does it look weird?
It is :(
>>> migration_stats.handle_stop(self)
>>> self.img.close()
>>> self.criu_connection.close()
>>> @@ -276,11 +285,14 @@ class phaul_iter_worker:
>>> logging.info("Started on target host")
>>>
>>> except:
>>> + self.fs.restore_shared_backups()
>>> self.htype.start()
>>> raise
>>>
>>> logging.info("Migration succeeded")
>>> + self.fs.cleanup_shared_backups()
>>> self.htype.umount()
>>
>> Can we do fs.cleanup_shared_backups() in htype.umount()?
>>
>
> We can but thereafter "umount" name would be wrong. What do you think
> about "release" name?
htype.cleanup() or htype.migration_complete() sound OK.
>>> + self.target_host.final_cleanup(self.fs.prepare_src_data({}))
>>> migration_stats.handle_stop()
>>>
>>> def __check_live_iter_progress(self, index, dstats, prev_dstats):
>>>
>>
>
More information about the CRIU
mailing list