[CRIU] [PATCH 01/10] p.haul: implement migration over existing connections

Mon Oct 19 01:39:35 PDT 2015

On 10/15/2015 10:20 PM, Tycho Andersen wrote:
> On Thu, Oct 15, 2015 at 12:21:35PM +0300, Pavel Emelyanov wrote:
>> On 10/14/2015 10:27 PM, Tycho Andersen wrote:
>>> Hi Nikita,
>>>
>>> Thanks for this work, it will be very useful for us.
>>>
>>> On Fri, Oct 09, 2015 at 09:11:33PM +0400, Nikita Spiridonov wrote:
>>>> Remove standalone mode, p.haul now can work only over existing
>>>> connections specified via command line arguments as file
>>>> descriptors.
>>>>
>>>> Three arguments required - --fdrpc for rpc calls, --fdmem for c/r
>>>> images migration and --fdfs for disk migration. Expect that each
>>>> file descriptor represent socket opened in blocking mode with domain
>>>> AF_INET and type SOCK_STREAM.
>>>
>>> Do we have to require --fdfs here for anything? I haven't looked
>>> through the code to see why exactly it is required.
>>
>> The fd socket is required to copy filesystem, but (!) only if required.
>> If the storage the container's files are on is shared, then this fd
>> will effectively become unused.
>>
>> I think we can do it like -- one can omit this parameter, but if the
>> htype driver says that fs migration _is_ required, then p.haul will
>> fail with error "no data channel for fs migration". Does this sound
>> OK to you?
> 
> Yep, that sounds fine.
> 
>>> In LXD (and I guess openvz as well, with your ploop patch) we are
>>> managing our own storage backends, and have our own mechanism for
>>> transporting the rootfs. 
>>
>> Can you shed more light on this? :) If there's some backend that can
>> be used by us as well, maybe it would make sense to put migration code
>> into p.haul?
> 
> Right now we have backends for zfs, lvm, btrfs, and just a regular
> directory on a filesystem. I'm not aware of us planning support for
> any other backends right now, but it's not out of the question.
> Additionally, we also want to migrate a container's snapshots when we
> migrate the container, which requires something to know about how we
> handle snapshotting for these various storage backends as well.

Yup, pretty same for us :)

> We also support non-live copying containers, so we need the code even
> without p.haul and ideally it would be good not to maintain it in two
> places, but,

You mean off-line copying a container across nodes?

>>> Ideally, I could invoke p.haul over an fd to
>>> just do the criu iterative piece, and potentially do some callbacks to
>>> tell LXD when the process is stopped so that we can do a final fs
>>> sync.
>>
>> The issue with fs sync is tightly coupled with memory migration iterations,
>> that's why I planned to put all this stuff into p.haul. If you do the
>> final fs sync and while doing this the amount of memory to be copied
>> increases, it might make sense to do one more iteration of pre-copy.
>> Without full p.haul control over both (memory and fs) it's hardly possible.
> 
> What about passing p.haul a socket and inventing a messaging protocol?
> Then p.haul could ask LXD (or whoever) to sync the filesystem, but
> also report any errors during migration better than just exit(1).

Let's try. Would you suggest how a protocol might look like?

-- Pavel