[CRIU] [PATCH 01/10] p.haul: implement migration over existing connections

Wed Oct 21 09:13:14 PDT 2015

On Wed, Oct 21, 2015 at 03:12:56PM +0300, Pavel Emelyanov wrote:
> On 10/19/2015 11:55 PM, Tycho Andersen wrote:
> > On Mon, Oct 19, 2015 at 11:39:35AM +0300, Pavel Emelyanov wrote:
> >> On 10/15/2015 10:20 PM, Tycho Andersen wrote:
> >>> On Thu, Oct 15, 2015 at 12:21:35PM +0300, Pavel Emelyanov wrote:
> >>>> On 10/14/2015 10:27 PM, Tycho Andersen wrote:
> >>>>> Hi Nikita,
> >>>>>
> >>>>> Thanks for this work, it will be very useful for us.
> >>>>>
> >>>>> On Fri, Oct 09, 2015 at 09:11:33PM +0400, Nikita Spiridonov wrote:
> >>>>>> Remove standalone mode, p.haul now can work only over existing
> >>>>>> connections specified via command line arguments as file
> >>>>>> descriptors.
> >>>>>>
> >>>>>> Three arguments required - --fdrpc for rpc calls, --fdmem for c/r
> >>>>>> images migration and --fdfs for disk migration. Expect that each
> >>>>>> file descriptor represent socket opened in blocking mode with domain
> >>>>>> AF_INET and type SOCK_STREAM.
> >>>>>
> >>>>> Do we have to require --fdfs here for anything? I haven't looked
> >>>>> through the code to see why exactly it is required.
> >>>>
> >>>> The fd socket is required to copy filesystem, but (!) only if required.
> >>>> If the storage the container's files are on is shared, then this fd
> >>>> will effectively become unused.
> >>>>
> >>>> I think we can do it like -- one can omit this parameter, but if the
> >>>> htype driver says that fs migration _is_ required, then p.haul will
> >>>> fail with error "no data channel for fs migration". Does this sound
> >>>> OK to you?
> >>>
> >>> Yep, that sounds fine.
> >>>
> >>>>> In LXD (and I guess openvz as well, with your ploop patch) we are
> >>>>> managing our own storage backends, and have our own mechanism for
> >>>>> transporting the rootfs. 
> >>>>
> >>>> Can you shed more light on this? :) If there's some backend that can
> >>>> be used by us as well, maybe it would make sense to put migration code
> >>>> into p.haul?
> >>>
> >>> Right now we have backends for zfs, lvm, btrfs, and just a regular
> >>> directory on a filesystem. I'm not aware of us planning support for
> >>> any other backends right now, but it's not out of the question.
> >>> Additionally, we also want to migrate a container's snapshots when we
> >>> migrate the container, which requires something to know about how we
> >>> handle snapshotting for these various storage backends as well.
> >>
> >> Yup, pretty same for us :)
> >>
> >>> We also support non-live copying containers, so we need the code even
> >>> without p.haul and ideally it would be good not to maintain it in two
> >>> places, but,
> >>
> >> You mean off-line copying a container across nodes?
> > 
> > Yep exactly.
> > 
> >>>>> Ideally, I could invoke p.haul over an fd to
> >>>>> just do the criu iterative piece, and potentially do some callbacks to
> >>>>> tell LXD when the process is stopped so that we can do a final fs
> >>>>> sync.
> >>>>
> >>>> The issue with fs sync is tightly coupled with memory migration iterations,
> >>>> that's why I planned to put all this stuff into p.haul. If you do the
> >>>> final fs sync and while doing this the amount of memory to be copied
> >>>> increases, it might make sense to do one more iteration of pre-copy.
> >>>> Without full p.haul control over both (memory and fs) it's hardly possible.
> >>>
> >>> What about passing p.haul a socket and inventing a messaging protocol?
> >>> Then p.haul could ask LXD (or whoever) to sync the filesystem, but
> >>> also report any errors during migration better than just exit(1).
> >>
> >> Let's try. Would you suggest how a protocol might look like?
> > 
> > What about something like,
> > 
> > enum phaulmsgtype {
> > 	ERROR		= 0;
> > 	SYNCFS		= 1;
> > 	SUCCESS		= 2;
> > 	/* other message types as necessary */
> > }
> > 
> > message phaul {
> > 	required phaulmsgtype	type		= 1;
> > 
> > 	/* for ERROR and SUCCESS, perhaps just the contents of the
> > 	 * CRIU log?
> > 	 */
> > 	optional string		message		= 2;
> > }
> > 
> > which you pass to p.haul via a --msgfd. I can think of a few ways it
> > could work:
> > 
> > * if you pass msgfd, your client always has to move the filesystem.
> >   This seems a little ugly though, as getting the logs (and not just
> >   p.haul's exit code) may be useful for others, so they don't have to
> >   know how p.haul drives CRIU to know where to look for the logs.
> > 
> > * when you pass msgfd, p.haul will send a SYNCFS message. If it gets
> >   an UNSUP message back, it uses the htype driver's storage backend
> >   (or fails if this also fails). If it is supported, the p.haul caller
> >   either sends a SUCCESS or ERROR message depending on what happened.
> > 
> > Does that make sense? I haven't looked at the p.haul code much, so I
> > could be totally off base.
> 
> At the first glance -- it has. Then we need some fs_faul_external.py module
> that p_haul_lxc would return and which will take care of communicating to
> the caller about moving the FS around :)

Yep, sounds good. I have some other p.haul related patches, that I
want to send, but I want to do this as well to try and get p.haul
working end to end in LXD. Stay tuned for further developments :)

Tycho