[CRIU] [PATCH v5] Add docker phaul driver

Mon Oct 26 12:02:50 PDT 2015

On 10/26/2015 09:53 PM, Hui Kang wrote:
> On Mon, Oct 26, 2015 at 2:26 PM, Pavel Emelyanov <xemul at parallels.com> wrote:
>> On 10/26/2015 08:59 PM, Hui Kang wrote:
>>> Hi, Pavel
>>> Thanks for your comment. Please see my reply inline.
>>>
>>> On Mon, Oct 26, 2015 at 5:06 AM, Pavel Emelyanov <xemul at parallels.com> wrote:
>>>> Please, find my comment inline.
>>>>
>>>>> diff --git a/p.haul b/p.haul
>>>>> index 11460b4..478e1c7 100755
>>>>> --- a/p.haul
>>>>> +++ b/p.haul
>>>>> @@ -25,7 +25,7 @@ import phaul.htype
>>>>>
>>>>>  parser = argparse.ArgumentParser("Process HAULer")
>>>>>  parser.add_argument("type", choices=phaul.htype.get_haul_names(),
>>>>> -     help="Type of hat to haul, e.g. vz or lxc")
>>>>> +     help="Type of hat to haul, e.g. vz, lxc, or docker")
>>>>>  parser.add_argument("id", help="ID of what to haul")
>>>>>  parser.add_argument("--to", help="IP where to haul")
>>>>>  parser.add_argument("--fdrpc", help="File descriptor of rpc socket", type=int, required=True)
>>>>> diff --git a/phaul/fs_haul_subtree.py b/phaul/fs_haul_subtree.py
>>>>> index a9bd559..e20ad1b 100644
>>>>> --- a/phaul/fs_haul_subtree.py
>>>>> +++ b/phaul/fs_haul_subtree.py
>>>>> @@ -11,29 +11,38 @@ import logging
>>>>>  rsync_log_file = "rsync.log"
>>>>>
>>>>>  class p_haul_fs:
>>>>> -     def __init__(self, subtree_path):
>>>>> -             logging.info("Initialized subtree FS hauler (%s)", subtree_path)
>>>>> -             self.__root = subtree_path
>>>>> +     def __init__(self, subtree_paths):
>>>>> +             self.__roots = []
>>>>> +             for path in subtree_paths:
>>>>> +                     logging.info("Initialized subtree FS hauler (%s)", path)
>>>>> +                     self.__roots.append(path)
>>>>> +
>>>>>               self.__thost = None
>>>>>
>>>>>       def set_options(self, opts):
>>>>>               self.__thost = opts["to"]
>>>>>
>>>>> +     def set_target_host(self, thost):
>>>>> +             self.__thost = thost
>>>>> +
>>>>>       def set_work_dir(self, wdir):
>>>>>               self.__wdir = wdir
>>>>>
>>>>>       def __run_rsync(self):
>>>>>               logf = open(os.path.join(self.__wdir, rsync_log_file), "w+")
>>>>> -             dst = "%s:%s" % (self.__thost, os.path.dirname(self.__root))
>>>>>
>>>>> -             # First rsync might be very long. Wait for it not
>>>>> -             # to produce big pause between the 1st pre-dump and
>>>>> -             # .stop_migration
>>>>> +             for dir_name in self.__roots:
>>>>> +
>>>>> +                     dst = "%s:%s" % (self.__thost, os.path.dirname(dir_name))
>>>>> +
>>>>> +                     # First rsync might be very long. Wait for it not
>>>>> +                     # to produce big pause between the 1st pre-dump and
>>>>> +                     # .stop_migration
>>>>>
>>>>> -             ret = sp.call(["rsync", "-a", self.__root, dst],
>>>>> +                     ret = sp.call(["rsync", "-a", dir_name, dst],
>>>>>                               stdout = logf, stderr = logf)
>>>>> -             if ret != 0:
>>>>> -                     raise Exception("Rsync failed")
>>>>> +                     if ret != 0:
>>>>> +                             raise Exception("Rsync failed")
>>>>>
>>>>>       def start_migration(self):
>>>>>               logging.info("Starting FS migration")
>>>>> diff --git a/phaul/htype.py b/phaul/htype.py
>>>>> index f3a4774..bee49b4 100644
>>>>> --- a/phaul/htype.py
>>>>> +++ b/phaul/htype.py
>>>>> @@ -11,6 +11,7 @@ __haul_modules = {
>>>>>       "vz": "p_haul_vz",
>>>>>       "pid": "p_haul_pid",
>>>>>       "lxc": "p_haul_lxc",
>>>>> +     "docker": "p_haul_docker",
>>>>>  }
>>>>>
>>>>>  def __get(id):
>>>>> diff --git a/phaul/images.py b/phaul/images.py
>>>>> index 57dca59..e48f96f 100644
>>>>> --- a/phaul/images.py
>>>>> +++ b/phaul/images.py
>>>>> @@ -145,7 +145,8 @@ class phaul_images:
>>>>>               tf = img_tar(sk, cdir)
>>>>>
>>>>>               logging.info("\tPack")
>>>>> -             for img in filter(lambda x: x.endswith(".img"), os.listdir(cdir)):
>>>>> +             # .json file is needed for docker, i.e., descriptor.json
>>>>> +             for img in filter(lambda x: x.endswith(('.img', '.json')) , os.listdir(cdir)):
>>>>
>>>> Why do we include .json files from images dir? Where do the come from?
>>>
>>> This descriptors.json is generated by "docker checkpoint". I copied
>>> its content here
>>>
>>>  cat /var/run/docker/execdriver/native/4fe6ae42044ecb216788a2dc8cae900f9bd3ca93b7e84a1cee56acfe34c46e32/criu.image/descriptors.json
>>> ["/dev/null","pipe:[1095038]","pipe:[1095039]"]
>>>
>>> "docker restore" will fail without this descriptors.json. Since this
>>> file exists with all other criu images, we can use sync_image to send
>>> the file.
>>
>> OK. Would you feed one into the transfer blob via the put_meta_images() callback?
>>
> 
> No, I do not think put_meta_images() does any transfer task.

It doesn't :) It just supplies the list of files to be put into the
transfer tarball.

> If we add
> some transfer code in put_meta_images only for the descriptor.json;
> kind of overkilling. In addition, since this descriptor.json stays
> with other criu.images, sync_imgs_to_target() maybe a better choice.
> What do you think?
> 
>>>>
>>>>>                       tf.add(img)
>>>>>
>>>>>               logging.info("\tAdd htype images")
>>>>> diff --git a/phaul/iters.py b/phaul/iters.py
>>>>> index 9be4e97..00b2a02 100644
>>>>> --- a/phaul/iters.py
>>>>> +++ b/phaul/iters.py
>>>>> @@ -113,7 +113,7 @@ class phaul_iter_worker:
>>>>>               if self.pre_dump == PRE_DUMP_AUTO_DETECT:
>>>>>                       # pre-dump auto-detection
>>>>>                       try:
>>>>> -                             self.pre_dump = self.pre_dump_check()
>>>>> +                             self.pre_dump = (self.pre_dump_check() and self.htype.can_pre_dump())
>>>>>                               logging.info("\t`- Auto %s" % (self.pre_dump and 'enabled' or 'disabled'))
>>>>>                       except:
>>>>>                               # The available criu seems to not
>>>>> @@ -121,6 +121,11 @@ class phaul_iter_worker:
>>>>>                               self.pre_dump = PRE_DUMP_DISABLE
>>>>>                               logging.info("\t`- Auto detection not possible "
>>>>>                                               "- Disabled")
>>>>> +             elif self.pre_dump == PRE_DUMP_DISABLE:
>>>>> +                     logging.info("\t`- Command-line disabled")
>>>>> +             elif self.htype.can_pre_dump() == False:
>>>>> +                     self.pre_dump = False
>>>>> +                     logging.info("\t`- Type does not support pre-dump")
>>>>
>>>> Why is this elif elif part required?
>>>
>>> This elif means if pre_dump is set true in command line, we need to
>>> further check if the phaul type supports pre_dump.
>>
>> But you don't -- the call for self.htype.can_pre_dump() is only done above,
>> when pre_dump is set to AUTO_DETECT.
> 
> This "elif self.htype.can_pre_dump() == False:" calls can_pre_dump()
> if pre_dump is not set to AUTO_DETECT.

Ouch, indeed.

>>
>> I would leave the cases then pre_dump is set explicitly alone, so that we
>> could _force_ pre_dump-s with command-line.
> 
> If we remove this "elif self.htype.can_pre_dump() == False:", then the
> chances are pre_dump is set explicity true in the command line while
> the type may not support pre_dump(). So i think this check is
> necessary.

If user with CLI says it wants pre-dump, then p.haul should just ignore anything
and go ahead with pre-dumps. This is what happens if the whole system doesn't
support pre-dumps, e.g. kernel w/o soft-dirty or criu being too old. Let's make
CLI commands do whatever user asks for even if the functionality doesn't work.
For those who doesn't want to dive into details we have default AUTO_DETECT mode
that would consult htype and do what is best for the current runtime.

>>
>>>>
>>>>>               else:
>>>>>                       logging.info("\t`- Command-line %s" % (self.pre_dump and 'enabled' or 'disabled'))
>>>>>
>>>>> diff --git a/phaul/p_haul_docker.py b/phaul/p_haul_docker.py
>>>>> new file mode 100644
>>>>> index 0000000..d2bb1e3
>>>>> --- /dev/null
>>>>> +++ b/phaul/p_haul_docker.py
>>>>> @@ -0,0 +1,158 @@
>>>>> +#
>>>>> +# Docker container hauler
>>>>> +#
>>>>> +
>>>>> +import os
>>>>> +import logging
>>>>> +import time
>>>>> +import fs_haul_subtree
>>>>> +import pycriu.rpc
>>>>> +import json
>>>>> +from pprint import pprint
>>>>> +import subprocess as sp
>>>>> +from subprocess import PIPE
>>>>> +
>>>>> +# TODO use docker-py
>>>>> +# import docker
>>>>> +
>>>>> +# Some constants for docker
>>>>> +name = "docker"
>>>>> +docker_exec = "/usr/bin/docker-1.9.0-dev"
>>>>> +docker_dir = "/var/lib/docker/"
>>>>> +docker_run_meta_dir = "/var/run/docker/execdriver/native"
>>>>> +
>>>>> +class p_haul_type:
>>>>> +     def __init__(self, ctid):
>>>>> +
>>>>> +             # TODO ctid must > 3 digit; with docker-py, we can also resolve
>>>>> +             #         container name
>>>>> +             if len(ctid) < 3:
>>>>> +                     raise Exception("Docker container ID must be > 3 digits")
>>>>> +
>>>>> +             self._ctid = ctid
>>>>> +             self._ct_rootfs = ""
>>>>> +
>>>>> +     def get_driver_name(self):
>>>>> +             return name
>>>>> +
>>>>> +     def init_src(self):
>>>>> +             self.full_ctid = self.get_full_ctid()
>>>>> +             self.__load_ct_config(docker_dir)
>>>>> +
>>>>> +     def init_dst(self):
>>>>> +             pass
>>>>> +
>>>>> +     def adjust_criu_req(self, req):
>>>>> +             """Add module-specific options to criu request"""
>>>>> +             pass
>>>>> +
>>>>> +     def root_task_pid(self):
>>>>> +             # Do we need this for Docker?
>>>>> +             return self.full_ctid
>>>>> +
>>>>> +     def __load_ct_config(self, path):
>>>>> +
>>>>> +             # Each docker container has 3 directories that need to be
>>>>> +             # migrated: (1) root filesystem, (2) container configuration,
>>>>> +             # (3) runtime meta state
>>>>> +             self._ct_rootfs = os.path.join(docker_dir, "aufs/mnt", self.full_ctid)
>>>>> +             self._ct_config_dir = os.path.join(docker_dir, "containers", self.full_ctid)
>>>>> +             self._ct_run_meta_dir = os.path.join(docker_run_meta_dir, self.full_ctid)
>>>>> +             logging.info("Container rootfs: %s", self._ct_rootfs)
>>>>> +             logging.info("Container config: %s", self._ct_config_dir)
>>>>> +             logging.info("Container meta: %s", self._ct_run_meta_dir)
>>>>> +
>>>>> +     def set_options(self, opts):
>>>>> +             pass
>>>>> +
>>>>> +     # Remove any specific FS setup
>>>>> +     def umount(self):
>>>>> +             pass
>>>>> +
>>>>> +     def get_fs(self, fs_sk=None):
>>>>> +             # use rsync for rootfs and configuration directories
>>>>> +             return fs_haul_subtree.p_haul_fs([self._ct_rootfs, self._ct_config_dir])
>>>>> +
>>>>> +     def get_fs_receiver(self, fs_sk=None):
>>>>> +             return None
>>>>> +
>>>>> +     def get_full_ctid(self):
>>>>> +             dir_name_list = os.listdir(os.path.join(docker_dir, "containers"))
>>>>> +
>>>>> +             full_id = ""
>>>>> +             for name in dir_name_list:
>>>>> +                     name = name.rsplit("/")
>>>>> +                     if (name[0].find(self._ctid) == 0):
>>>>> +                             full_id = name[0]
>>>>> +                             break
>>>>> +
>>>>> +             if full_id != "":
>>>>> +                     return full_id
>>>>> +             else:
>>>>> +                     raise Exception("Can not find container fs")
>>>>> +
>>>>> +     def final_dump(self, pid, img, ccon, fs):
>>>>> +             logging.info("Dump docker container %s", pid)
>>>>> +
>>>>> +             # TODO: docker API does not have checkpoint right now
>>>>> +             # cli.checkpoint() so we have to use the command line
>>>>> +             # cli = docker.Client(base_url='unix://var/run/docker.sock')
>>>>> +             # output = cli.info()
>>>>> +             # call docker API
>>>>> +
>>>>> +             logf = open("/tmp/docker_checkpoint.log", "w+")
>>>>> +             image_path_opt = "--image-dir=" + img.image_dir()
>>>>> +             ret = sp.call([docker_exec, "checkpoint", image_path_opt, self._ctid],
>>>>> +                     stdout = logf, stderr = logf)
>>>>> +             if ret != 0:
>>>>> +                     raise Exception("docker checkpoint failed")
>>>>> +     #
>>>>> +     # Meta-images for docker -- /var/run/docker
>>>>> +     #
>>>>> +     def get_meta_images(self, path):
>>>>> +             # Send the meta state file with criu images
>>>>> +             return [(os.path.join(self._ct_run_meta_dir, "state.json"), "state.json")]
>>>>> +
>>>>> +     def put_meta_images(self, dir):
>>>>> +             # Create docker runtime meta dir on dst side
>>>>> +             with open(os.path.join(dir, "state.json")) as data_file:
>>>>> +                     data = json.load(data_file)
>>>>> +             self.full_ctid=data["id"]
>>>>> +
>>>>> +             self.__load_ct_config(docker_dir)
>>>>> +             os.makedirs(self._ct_run_meta_dir)
>>>>> +             pd = sp.Popen(["cp", os.path.join(dir, "state.json"), self._ct_run_meta_dir], stdout = PIPE)
>>>>> +             status = pd.wait()
>>>>> +
>>>>> +     def kill_last_docker_daemon(self):
>>>>> +             p = sp.Popen(['pgrep', '-l' , docker_exec], stdout=sp.PIPE)
>>>>> +             out, err = p.communicate()
>>>>> +
>>>>> +             for line in out.splitlines():
>>>>> +                     line = bytes.decode(line)
>>>>> +                     pid = int(line.split(None, 1)[0])
>>>>> +                     os.kill(pid, signal.SIGKILL)
>>>>> +
>>>>> +     def final_restore(self, img, criu):
>>>>> +             logf = open("/tmp/docker_restore.log", "w+")
>>>>> +
>>>>> +             # Kill any previous docker daemon in order to reload the
>>>>> +             # status of the migrated container
>>>>> +             self.kill_last_docker_daemon()
>>>>> +
>>>>> +             # start docker daemon in background
>>>>> +             daemon = sp.Popen([docker_exec, "daemon", "-s", "aufs"],
>>>>> +                              stdout = logf, stderr = logf)
>>>>> +             # daemon.wait() TODO: docker daemon not return
>>>>> +             time.sleep(2)
>>>>> +
>>>>> +             image_path_opt = "--image-dir=" + img.image_dir()
>>>>> +             ret = sp.call([docker_exec, "restore", image_path_opt, self._ctid],
>>>>> +                                             stdout = logf, stderr = logf)
>>>>> +             if ret != 0:
>>>>> +                     raise Exception("docker restore failed")
>>>>> +
>>>>> +     def can_pre_dump(self):
>>>>> +             # XXX: Do not do predump for docker right now. Add page-server
>>>>> +             #       to docker C/R API, then we can enable the pre-dump
>>>>> +             return False
>>>>> diff --git a/phaul/p_haul_pid.py b/phaul/p_haul_pid.py
>>>>> index e354dff..43af654 100644
>>>>> --- a/phaul/p_haul_pid.py
>>>>> +++ b/phaul/p_haul_pid.py
>>>>> @@ -94,3 +94,6 @@ class p_haul_type:
>>>>>       # Get list of veth pairs if any
>>>>>       def veths(self):
>>>>>               return []
>>>>> +
>>>>> +     def can_pre_dump(self):
>>>>> +             return True
>>>>> diff --git a/phaul/p_haul_vz.py b/phaul/p_haul_vz.py
>>>>> index 903006d..e346bdf 100644
>>>>> --- a/phaul/p_haul_vz.py
>>>>> +++ b/phaul/p_haul_vz.py
>>>>> @@ -244,6 +244,9 @@ class p_haul_type:
>>>>>               return True
>>>>>
>>>>>
>>>>> +     def can_pre_dump(self):
>>>>> +             return True
>>>>> +
>>>>>  def parse_vz_config(body):
>>>>>       """Parse shell-like virtuozzo config file"""
>>>>>
>>>>> diff --git a/phaul/service.py b/phaul/service.py
>>>>> index a03b43e..507d171 100644
>>>>> --- a/phaul/service.py
>>>>> +++ b/phaul/service.py
>>>>> @@ -72,7 +72,8 @@ class phaul_service:
>>>>>       def rpc_start_iter(self):
>>>>>               self.dump_iter_index += 1
>>>>>               self.img.new_image_dir()
>>>>> -             self.start_page_server()
>>>>> +             if self.htype.get_driver_name() != "docker" :
>>>>> +                     self.start_page_server()
>>>>
>>>> Explicit docker check again. What bad happens if you let the page server
>>>> spawn?
>>>
>>> Here is the error message after remoing the check for docker:
>>>
>>> 17:50:24.551:     Page server started at 3833
>>> 17:50:32.329: Started images server
>>> 17:50:32.333: Exception in untar_thread
>>> Traceback (most recent call last):
>>>   File "/root/development/p.haul/phaul/images.py", line 42, in run
>>>     tf = tarfile.open(mode="r|", fileobj=util.fileobj_wrap(self.__sk))
>>>   File "/usr/lib/python2.7/tarfile.py", line 1690, in open
>>>     **kwargs)
>>>   File "/usr/lib/python2.7/tarfile.py", line 1574, in __init__
>>>     self.firstmember = self.next()
>>>   File "/usr/lib/python2.7/tarfile.py", line 2335, in next
>>>     raise ReadError(str(e))
>>> ReadError: bad checksum
>>
>> O_o But this has (should have) nothing to do with page-server, it's
>> just some tarfile internal stuff.
>>
> 
> You are right. However, whenever I disable page_server, the error
> message goes away. Eachtime I enable it, error comes back. So I doubt
> page_server() may be relevant to the error.

I would nonetheless investigate why this happens this way, but in low
priority, all the more so we're going to ban page-server with htype
hook.

>>> 17:50:32.348: Waiting for images to unpack
>>> 17:50:32.349: Restoring from images
>>>
>>> I doubt that is is because there is not pre_dump for docker, so the
>>> page_server does not receive any criu image. Is this correct?
>>
>> AFAIU yes, you do final dump on disk, so page server is reqlly unneeded
>> here. Please, see my next commend about what to do with it.
>>
>>>>
>>>> BTW, a question to Nikita -- I don't see in p_haul_vz.py that we do final
>>>> dump go via page-server. Do we?
>>>
>>> >From my understanding, no final dump may go via page-server. The criu
>>> images are sent using tar/untar via the mem_sk.
>>
>> OK, so what we have here is -- final dump for everyone is done w/o the
>> page server. That's a pity we should fix, but for now let's make yet
>> another (temporary) htype method 'dump_need_ps()' denoting that final
>> dump does or does not go via page-server and start or not this guy
>> respectively. OK? Some time soon this will be removed, but for now
>> let's make it w/o explicit "name == docker" checks.
> 
> I like this idea to remove the name check for docker.

OK, thanks :)

-- Pavel