[CRIU] [PATCH 0/7] CRiu Image Tool(crit), v4

Tue Jan 13 09:48:09 PST 2015

On 01/13/2015 04:35 PM, Pavel Emelyanov wrote:
>>>> b) About payload and extra fields. I think we need them, as they
>>>>      show the structure of criu images according to one that is
>>>>      described in our Images wiki page:
>>>>
>>>> All criu-specific image files begin with 32-bit magic cookie. Images in
>>>> PB format are followed by zero or more entries of the same type (not
>>>> size!), each entry is preceded with 32-bit entry size value (not
>>>> including this 32-bit value itself). Optionally each entry may be
>>>> followed by extra payload which depends on the entry type.
>>>>
>>>> IOW protocol-buffers image files look like
>>>>
>>>> IMAGE_FILE ::= MAGIC { ENTRY }
>>>> ENTRY      ::= SIZE PAYLOAD [ EXTRA ]
>>>> PAYLOAD    ::= "message encoded in ProtocolBuffer format"
>>>> EXTRA      ::= "arbitrary blob, depends on the PAYLOAD contents"
>>>>
>>>> MAGIC      ::= "32 bit integer"
>>>> SIZE       ::= "32 bit integer, equals the PAYLOAD length"
>>>>
>>>>      So the current structure with "payload" and "extra" fields
>>>>      makes perfect sense to me. The linear structure that
>>>>      we've used previously in our text-format confuses the user,
>>>>      just the way I was confused about pagemap\pipes-data image
>>>>      structure.
>>> Hm... The wiki page is not a rule written in stone. Let's make the
>>> json representation look human-readable and fix the wiki page. Like
>>> this:
>>>
>>> IMAGE_FILE ::= MAGIC { SIZE ENTRY [PAYLOAD] }
>>> ENTRY      ::= "message encoded in PB"
>>> PAYLOAD    ::= "arbitrary blob"
>>>
>>> :)
>>>
>>> Too many braces and extra fields make things harder to understand.
>> As I said before, linear { SIZE ENTRY [PAYLOAD] } structure is confusing.
> But this one and what's on wiki are equal.

Yes, this one is a low level description of what's on wiki =).

>> I like the one on wiki better, especially when describing pipes-data and
>> pagemap images. I think that even though there are extra fields and braces
>> involved, current structure looks a lot better and is self-describing.
>> And the crit sources look a lot better when sticking to wiki description.
>> I would really like to stick with wiki description.
> Right now we're arguing in "like" "don't like" terms. With this we won't
> make any decision, but "just do it" one. Thus I propose to stick to the
> only measurable argument we have -- less text is easier to read.

Yes, you are right, sorry for "like/don't like" terms.

> Is it technically possible to make the json text look like this:
>
>   {
>        "magic": "REG_FILES",
>        "entries": [
>            {
>                "foo": "1",
>                "bar": "BAR",
>                "extra": {
>                      /* entry-specific stuff here */
>                }
>            },
>
> ?
>
> And how bad would the code look like for this?

You mean like add "extra" field to the payload, right?
Yes, it does look better.

>>>> c) I wrap every value in "" as it isn't that easy to convert
>>>>      from pb to json, as theirs supported types do not match,
>>>>      and wrapping everything into strings is just the easiest
>>>>      way to get everything work. I'm currently thinking on
>>>>      a way to improve that, but for now it works just fine.
>>> OK. Can we at least keep strings as string, not as string with string?
>>> I mean this:
>>>
>>> "path": "/foo/bar"
>>>
>>> instead of
>>>
>>> "path": "\"/foo/bar\""
>> Yes, we can, but it is a next step, i'm working on it in whole context
>> of impoving pb<->json convertation.
> This step should be accomplished till v1.5 (beginning of March). Can we?

Yes, definitely.
But I'll try to fix it asap.

>>>>> 3. Let's now look at reg-files.img
>>>>>
>>>>> {
>>>>>       "magic": "REG_FILES",
>>>>>       "entries": [
>>>>>           {
>>>>>               "payload": {
>>>>>                   "fown": {
>>>>>                       "pid_type": "0",
>>>>>                       "signum": "0",
>>>>>                       "pid": "0",
>>>>>                       "uid": "0",
>>>>>                       "euid": "0"
>>>>>                   },
>>>>>                   "flags": "32770",
>>>>>                   "id": "1",
>>>>>                   "name": "\"/dev/null\"",
>>>>>                   "pos": "0"
>>>>>               }
>>>>>           },
>>>>>           {
>>>>>               "payload": {
>>>>>                   "name": "\"/\"",
>>>>>                   "fown": {
>>>>>                       "pid_type": "0",
>>>>>                       "signum": "0",
>>>>>                       "pid": "0",
>>>>>                       "uid": "0",
>>>>>                       "euid": "0"
>>>>>                   },
>>>>>                   "pos": "0",
>>>>>                   "flags": "33793",
>>>>>                   "id": "2",
>>>>>                   "size": "0"
>>>>>               }
>>>>>           },
>>>>>
>>>>> Other than those from inventory, the concerns here are:
>>>>>
>>>>> a) The order of fields differs from entry to entry.
>>>> It is normal for json =).
>>> Yes, but not for human eyes :)
>>>> I thought about it and concluded that it doesn't harm us in any way,
>>>> as we only care about the order in "repeated" fields and we treat
>>>> them properly by converting to an array, so the order is saved as
>>>> the order of elements in json arrays is preserved.
>>> How hard would it be to keep names in the order they are met in pb file?
>> Well, json specification says that the order shouldn't be preserved.
> I agree :)
>
>> Same thing with dicts, as the names are hashable, used to find
>> items and are tossed around .
> Absolutely.
>
>> Hm... But what if we just store protobuf-text format string inside json?
> I don't understand this. How would the text look like?

Here is how reg-files.img would look like:

{
      "magic": "REG_FILES",
      "entries": [
          {
              "payload": "
                  fown: {
                      pid_type: 0,
                      signum: 0,
                      pid: 0,
                      uid: 0,
                      euid: 0
                  },
                  flags: 32770,
                  id: 1,
                  name: "/dev/null",
                  pos: 0"
          },

So the payload is a string that contains pb entry in protobuf text format.

>> It would solve all those problems with convertation and saving the names
>> order, but whould brake integrity check(which is still weak compared to
>> protobuf)...
>>
>> I was also considering packing image into one protobuf message,
>> similar to current json, when it looks like:
>> message criu_image {
>>      required string magic = 1;
>>      repeated entry entries = 2;
>> }
>>
>> message criu_entry {
>>      required criu_payload payload = 1;
>>      optional criu_extra extra = 2;
>> }
>>
>> message criu_payload {
>>      required string name = 1; // for example "core"
>>      optional core_entry core = 1;
>>      ....
>> }
>>
>> message criu_extra {
>>      required string name = 1; // for example "pagemaps"
>>      repeated pagemap_entry pagemaps = 2;
>>      ...
>> }
> PB docs say, that the format is good for encoding "small" messages (without
> specification of how small the small is). According to this, keeping all
> that CRIU generates in one entry (with sub-entries) is not good idea.

Yes, but we are using it just to print the images. I mean, we will read them
from *.img, pack in criu_image structure and print using pb text format.
And our images don't usually contain large amounts of data.

>> And it now looks like it fits better, even though it requires
>> a bit more code to write(actually, it can be autogenerated =) ).
>> This way it will be still readable, easy to parse from file
>> and will preserve the names order.
>> What do you think?
>>
>>
>>>>> b) The name of the 2nd entry is just /, but in the output it takes 7 characters to print one.
>>>>>
>>>> Yep, that is a side-effect of that simplified pb<->json convertation.
>>>>
>>>>> 4. Next the pagemap.img
>>>>>
>>>>> {
>>>>>       "magic": "PAGEMAP",
>>>>>       "entries": [
>>>>>           {
>>>>>               "payload": {
>>>>>                   "pages_id": "1"
>>>>>               },
>>>>>               "extra": [
>>>>>                   {
>>>>>                       "nr_pages": "1",
>>>>>                       "vaddr": "4194304"
>>>>>                   },
>>>>>
>>>>> Similar to the inventory, field "extra" is obfuscating.
>>>>>
>>>> See 1.
>>> Disagree. The pagemap_entry-s are not "extra". They are entries w/o extra payload.
>> But wiki says that " Images in PB format are followed by zero or more entries of the _SAME TYPE_",
> Wiki is wrong here :) The EXTRA is stated to be "arbitrary blob", but it's
> not such in this case.

Why? It fits perfectly if we state that pagemap_entry's are the extra 
for pagemap_head.

>> which makes perfect sense to me. As I mentioned before, wiki description looks a
>> lot better than linear.
> The images we generate are not nice, yes. We even have a page describing how
> exactly they are not nice. But this is what we have, and starting developing
> new format is good task, but too early. Let's try to make crit decode produce
> as nice-looking output as can possibly be with the current images set.

Those messages (criu_image, criu_payload etc) are not for changing our 
images =).
They are just to contain and treat images within crit, just to keep them 
sorted and
nice-looking and be able to just parse human readable image using standart
ParseFromString protobuf method=).

> Thanks,
> Pavel
>