[CRIU] [PATCH 01/10] p.haul: implement migration over existing connections
Nikita Spiridonov
nspiridonov at odin.com
Tue Oct 20 02:16:01 PDT 2015
On Mon, 2015-10-19 at 14:55 -0600, Tycho Andersen wrote:
> On Mon, Oct 19, 2015 at 11:39:35AM +0300, Pavel Emelyanov wrote:
> > On 10/15/2015 10:20 PM, Tycho Andersen wrote:
> > > On Thu, Oct 15, 2015 at 12:21:35PM +0300, Pavel Emelyanov wrote:
> > >> On 10/14/2015 10:27 PM, Tycho Andersen wrote:
> > >>> Hi Nikita,
> > >>>
> > >>> Thanks for this work, it will be very useful for us.
> > >>>
> > >>> On Fri, Oct 09, 2015 at 09:11:33PM +0400, Nikita Spiridonov wrote:
> > >>>> Remove standalone mode, p.haul now can work only over existing
> > >>>> connections specified via command line arguments as file
> > >>>> descriptors.
> > >>>>
> > >>>> Three arguments required - --fdrpc for rpc calls, --fdmem for c/r
> > >>>> images migration and --fdfs for disk migration. Expect that each
> > >>>> file descriptor represent socket opened in blocking mode with domain
> > >>>> AF_INET and type SOCK_STREAM.
> > >>>
> > >>> Do we have to require --fdfs here for anything? I haven't looked
> > >>> through the code to see why exactly it is required.
> > >>
> > >> The fd socket is required to copy filesystem, but (!) only if required.
> > >> If the storage the container's files are on is shared, then this fd
> > >> will effectively become unused.
> > >>
> > >> I think we can do it like -- one can omit this parameter, but if the
> > >> htype driver says that fs migration _is_ required, then p.haul will
> > >> fail with error "no data channel for fs migration". Does this sound
> > >> OK to you?
> > >
> > > Yep, that sounds fine.
> > >
> > >>> In LXD (and I guess openvz as well, with your ploop patch) we are
> > >>> managing our own storage backends, and have our own mechanism for
> > >>> transporting the rootfs.
> > >>
> > >> Can you shed more light on this? :) If there's some backend that can
> > >> be used by us as well, maybe it would make sense to put migration code
> > >> into p.haul?
> > >
> > > Right now we have backends for zfs, lvm, btrfs, and just a regular
> > > directory on a filesystem. I'm not aware of us planning support for
> > > any other backends right now, but it's not out of the question.
> > > Additionally, we also want to migrate a container's snapshots when we
> > > migrate the container, which requires something to know about how we
> > > handle snapshotting for these various storage backends as well.
> >
> > Yup, pretty same for us :)
> >
> > > We also support non-live copying containers, so we need the code even
> > > without p.haul and ideally it would be good not to maintain it in two
> > > places, but,
> >
> > You mean off-line copying a container across nodes?
>
> Yep exactly.
>
> > >>> Ideally, I could invoke p.haul over an fd to
> > >>> just do the criu iterative piece, and potentially do some callbacks to
> > >>> tell LXD when the process is stopped so that we can do a final fs
> > >>> sync.
> > >>
> > >> The issue with fs sync is tightly coupled with memory migration iterations,
> > >> that's why I planned to put all this stuff into p.haul. If you do the
> > >> final fs sync and while doing this the amount of memory to be copied
> > >> increases, it might make sense to do one more iteration of pre-copy.
> > >> Without full p.haul control over both (memory and fs) it's hardly possible.
> > >
> > > What about passing p.haul a socket and inventing a messaging protocol?
> > > Then p.haul could ask LXD (or whoever) to sync the filesystem, but
> > > also report any errors during migration better than just exit(1).
> >
> > Let's try. Would you suggest how a protocol might look like?
>
> What about something like,
>
> enum phaulmsgtype {
> ERROR = 0;
> SYNCFS = 1;
> SUCCESS = 2;
> /* other message types as necessary */
> }
>
> message phaul {
> required phaulmsgtype type = 1;
>
> /* for ERROR and SUCCESS, perhaps just the contents of the
> * CRIU log?
> */
> optional string message = 2;
> }
>
> which you pass to p.haul via a --msgfd. I can think of a few ways it
> could work:
>
> * if you pass msgfd, your client always has to move the filesystem.
> This seems a little ugly though, as getting the logs (and not just
> p.haul's exit code) may be useful for others, so they don't have to
> know how p.haul drives CRIU to know where to look for the logs.
>
> * when you pass msgfd, p.haul will send a SYNCFS message. If it gets
> an UNSUP message back, it uses the htype driver's storage backend
> (or fails if this also fails). If it is supported, the p.haul caller
> either sends a SUCCESS or ERROR message depending on what happened.
>
> Does that make sense? I haven't looked at the p.haul code much, so I
> could be totally off base.
>
> Tycho
I can suggest slightly different semantics for --msgfd:
* We can use it for some diagnostic messages (e.g. ERROR, SUCCESS) and
interaction with parent unconditionally if it was specified
* We can encapsulate more complicated logic (e.g. SYNCFS) inside
certain phaul module (lxc in your case). Lxc module will create some
specific fs driver (similarly to fs_haul_subtree or fs_haul_ploop)
which will send messages through msgfd instead of doint actual work
manually.
It is not much better than your 2 ways actually, but it is easier to
implement and it will not affect another phaul modules. Some changes
in fs/fs_receiver in p_haul_iters/p_haul_service needed for such
implementation. msgfd definitely needed, but I can't suggest good
design for it at once.
Btw, I like the idea to use protobuf for msgfd.
More information about the CRIU
mailing list