[CRIU] [PATCH 0/7] CRiu Image Tool(crit), v4

Tue Jan 13 06:35:24 PST 2015

>>
>>> b) About payload and extra fields. I think we need them, as they
>>>     show the structure of criu images according to one that is
>>>     described in our Images wiki page:
>>>
>>> All criu-specific image files begin with 32-bit magic cookie. Images in
>>> PB format are followed by zero or more entries of the same type (not
>>> size!), each entry is preceded with 32-bit entry size value (not
>>> including this 32-bit value itself). Optionally each entry may be
>>> followed by extra payload which depends on the entry type.
>>>
>>> IOW protocol-buffers image files look like
>>>
>>> IMAGE_FILE ::= MAGIC { ENTRY }
>>> ENTRY      ::= SIZE PAYLOAD [ EXTRA ]
>>> PAYLOAD    ::= "message encoded in ProtocolBuffer format"
>>> EXTRA      ::= "arbitrary blob, depends on the PAYLOAD contents"
>>>
>>> MAGIC      ::= "32 bit integer"
>>> SIZE       ::= "32 bit integer, equals the PAYLOAD length"
>>>
>>>     So the current structure with "payload" and "extra" fields
>>>     makes perfect sense to me. The linear structure that
>>>     we've used previously in our text-format confuses the user,
>>>     just the way I was confused about pagemap\pipes-data image
>>>     structure.
>> Hm... The wiki page is not a rule written in stone. Let's make the
>> json representation look human-readable and fix the wiki page. Like
>> this:
>>
>> IMAGE_FILE ::= MAGIC { SIZE ENTRY [PAYLOAD] }
>> ENTRY      ::= "message encoded in PB"
>> PAYLOAD    ::= "arbitrary blob"
>>
>> :)
>>
>> Too many braces and extra fields make things harder to understand.
> 
> As I said before, linear { SIZE ENTRY [PAYLOAD] } structure is confusing.

But this one and what's on wiki are equal.

> I like the one on wiki better, especially when describing pipes-data and
> pagemap images. I think that even though there are extra fields and braces
> involved, current structure looks a lot better and is self-describing.
> And the crit sources look a lot better when sticking to wiki description.
> I would really like to stick with wiki description.

Right now we're arguing in "like" "don't like" terms. With this we won't
make any decision, but "just do it" one. Thus I propose to stick to the
only measurable argument we have -- less text is easier to read.

Is it technically possible to make the json text look like this:

 {
      "magic": "REG_FILES",
      "entries": [
          {
              "foo": "1",
              "bar": "BAR",
              "extra": {
                    /* entry-specific stuff here */
              }
          },

?

And how bad would the code look like for this?

>>> c) I wrap every value in "" as it isn't that easy to convert
>>>     from pb to json, as theirs supported types do not match,
>>>     and wrapping everything into strings is just the easiest
>>>     way to get everything work. I'm currently thinking on
>>>     a way to improve that, but for now it works just fine.
>> OK. Can we at least keep strings as string, not as string with string?
>> I mean this:
>>
>> "path": "/foo/bar"
>>
>> instead of
>>
>> "path": "\"/foo/bar\""
> 
> Yes, we can, but it is a next step, i'm working on it in whole context
> of impoving pb<->json convertation.

This step should be accomplished till v1.5 (beginning of March). Can we?

>>>> 3. Let's now look at reg-files.img
>>>>
>>>> {
>>>>      "magic": "REG_FILES",
>>>>      "entries": [
>>>>          {
>>>>              "payload": {
>>>>                  "fown": {
>>>>                      "pid_type": "0",
>>>>                      "signum": "0",
>>>>                      "pid": "0",
>>>>                      "uid": "0",
>>>>                      "euid": "0"
>>>>                  },
>>>>                  "flags": "32770",
>>>>                  "id": "1",
>>>>                  "name": "\"/dev/null\"",
>>>>                  "pos": "0"
>>>>              }
>>>>          },
>>>>          {
>>>>              "payload": {
>>>>                  "name": "\"/\"",
>>>>                  "fown": {
>>>>                      "pid_type": "0",
>>>>                      "signum": "0",
>>>>                      "pid": "0",
>>>>                      "uid": "0",
>>>>                      "euid": "0"
>>>>                  },
>>>>                  "pos": "0",
>>>>                  "flags": "33793",
>>>>                  "id": "2",
>>>>                  "size": "0"
>>>>              }
>>>>          },
>>>>
>>>> Other than those from inventory, the concerns here are:
>>>>
>>>> a) The order of fields differs from entry to entry.
>>> It is normal for json =).
>> Yes, but not for human eyes :)
>>> I thought about it and concluded that it doesn't harm us in any way,
>>> as we only care about the order in "repeated" fields and we treat
>>> them properly by converting to an array, so the order is saved as
>>> the order of elements in json arrays is preserved.
>> How hard would it be to keep names in the order they are met in pb file?
> 
> Well, json specification says that the order shouldn't be preserved.

I agree :)

> Same thing with dicts, as the names are hashable, used to find
> items and are tossed around .

Absolutely.

> Hm... But what if we just store protobuf-text format string inside json?

I don't understand this. How would the text look like?

> It would solve all those problems with convertation and saving the names
> order, but whould brake integrity check(which is still weak compared to
> protobuf)...
> 
> I was also considering packing image into one protobuf message,
> similar to current json, when it looks like:
> message criu_image {
>     required string magic = 1;
>     repeated entry entries = 2;
> }
> 
> message criu_entry {
>     required criu_payload payload = 1;
>     optional criu_extra extra = 2;
> }
> 
> message criu_payload {
>     required string name = 1; // for example "core"
>     optional core_entry core = 1;
>     ....
> }
> 
> message criu_extra {
>     required string name = 1; // for example "pagemaps"
>     repeated pagemap_entry pagemaps = 2;
>     ...
> }

PB docs say, that the format is good for encoding "small" messages (without
specification of how small the small is). According to this, keeping all
that CRIU generates in one entry (with sub-entries) is not good idea.

> And it now looks like it fits better, even though it requires
> a bit more code to write(actually, it can be autogenerated =) ).
> This way it will be still readable, easy to parse from file
> and will preserve the names order.
> What do you think?
> 
> 
>>>> b) The name of the 2nd entry is just /, but in the output it takes 7 characters to print one.
>>>>
>>> Yep, that is a side-effect of that simplified pb<->json convertation.
>>>
>>>> 4. Next the pagemap.img
>>>>
>>>> {
>>>>      "magic": "PAGEMAP",
>>>>      "entries": [
>>>>          {
>>>>              "payload": {
>>>>                  "pages_id": "1"
>>>>              },
>>>>              "extra": [
>>>>                  {
>>>>                      "nr_pages": "1",
>>>>                      "vaddr": "4194304"
>>>>                  },
>>>>
>>>> Similar to the inventory, field "extra" is obfuscating.
>>>>
>>> See 1.
>> Disagree. The pagemap_entry-s are not "extra". They are entries w/o extra payload.
> 
> But wiki says that " Images in PB format are followed by zero or more entries of the _SAME TYPE_",

Wiki is wrong here :) The EXTRA is stated to be "arbitrary blob", but it's
not such in this case.

> which makes perfect sense to me. As I mentioned before, wiki description looks a
> lot better than linear.

The images we generate are not nice, yes. We even have a page describing how
exactly they are not nice. But this is what we have, and starting developing
new format is good task, but too early. Let's try to make crit decode produce
as nice-looking output as can possibly be with the current images set.

Thanks,
Pavel