[CRIU] [PATCH 01/10] p.haul: implement migration over existing connections

Mon Oct 19 13:55:39 PDT 2015

On Mon, Oct 19, 2015 at 11:39:35AM +0300, Pavel Emelyanov wrote:
> On 10/15/2015 10:20 PM, Tycho Andersen wrote:
> > On Thu, Oct 15, 2015 at 12:21:35PM +0300, Pavel Emelyanov wrote:
> >> On 10/14/2015 10:27 PM, Tycho Andersen wrote:
> >>> Hi Nikita,
> >>>
> >>> Thanks for this work, it will be very useful for us.
> >>>
> >>> On Fri, Oct 09, 2015 at 09:11:33PM +0400, Nikita Spiridonov wrote:
> >>>> Remove standalone mode, p.haul now can work only over existing
> >>>> connections specified via command line arguments as file
> >>>> descriptors.
> >>>>
> >>>> Three arguments required - --fdrpc for rpc calls, --fdmem for c/r
> >>>> images migration and --fdfs for disk migration. Expect that each
> >>>> file descriptor represent socket opened in blocking mode with domain
> >>>> AF_INET and type SOCK_STREAM.
> >>>
> >>> Do we have to require --fdfs here for anything? I haven't looked
> >>> through the code to see why exactly it is required.
> >>
> >> The fd socket is required to copy filesystem, but (!) only if required.
> >> If the storage the container's files are on is shared, then this fd
> >> will effectively become unused.
> >>
> >> I think we can do it like -- one can omit this parameter, but if the
> >> htype driver says that fs migration _is_ required, then p.haul will
> >> fail with error "no data channel for fs migration". Does this sound
> >> OK to you?
> > 
> > Yep, that sounds fine.
> > 
> >>> In LXD (and I guess openvz as well, with your ploop patch) we are
> >>> managing our own storage backends, and have our own mechanism for
> >>> transporting the rootfs. 
> >>
> >> Can you shed more light on this? :) If there's some backend that can
> >> be used by us as well, maybe it would make sense to put migration code
> >> into p.haul?
> > 
> > Right now we have backends for zfs, lvm, btrfs, and just a regular
> > directory on a filesystem. I'm not aware of us planning support for
> > any other backends right now, but it's not out of the question.
> > Additionally, we also want to migrate a container's snapshots when we
> > migrate the container, which requires something to know about how we
> > handle snapshotting for these various storage backends as well.
> 
> Yup, pretty same for us :)
> 
> > We also support non-live copying containers, so we need the code even
> > without p.haul and ideally it would be good not to maintain it in two
> > places, but,
> 
> You mean off-line copying a container across nodes?

Yep exactly.

> >>> Ideally, I could invoke p.haul over an fd to
> >>> just do the criu iterative piece, and potentially do some callbacks to
> >>> tell LXD when the process is stopped so that we can do a final fs
> >>> sync.
> >>
> >> The issue with fs sync is tightly coupled with memory migration iterations,
> >> that's why I planned to put all this stuff into p.haul. If you do the
> >> final fs sync and while doing this the amount of memory to be copied
> >> increases, it might make sense to do one more iteration of pre-copy.
> >> Without full p.haul control over both (memory and fs) it's hardly possible.
> > 
> > What about passing p.haul a socket and inventing a messaging protocol?
> > Then p.haul could ask LXD (or whoever) to sync the filesystem, but
> > also report any errors during migration better than just exit(1).
> 
> Let's try. Would you suggest how a protocol might look like?

What about something like,

enum phaulmsgtype {
	ERROR		= 0;
	SYNCFS		= 1;
	SUCCESS		= 2;
	/* other message types as necessary */
}

message phaul {
	required phaulmsgtype	type		= 1;

	/* for ERROR and SUCCESS, perhaps just the contents of the
	 * CRIU log?
	 */
	optional string		message		= 2;
}

which you pass to p.haul via a --msgfd. I can think of a few ways it
could work:

* if you pass msgfd, your client always has to move the filesystem.
  This seems a little ugly though, as getting the logs (and not just
  p.haul's exit code) may be useful for others, so they don't have to
  know how p.haul drives CRIU to know where to look for the logs.

* when you pass msgfd, p.haul will send a SYNCFS message. If it gets
  an UNSUP message back, it uses the htype driver's storage backend
  (or fails if this also fails). If it is supported, the p.haul caller
  either sends a SUCCESS or ERROR message depending on what happened.

Does that make sense? I haven't looked at the p.haul code much, so I
could be totally off base.

Tycho