[CRIU] [PATCH 3/3] Issue #360: Anonymize image files

Pavel Emelianov xemul at virtuozzo.com
Tue Jun 25 12:52:23 MSK 2019


>>> diff --git a/lib/py/strip.py b/lib/py/strip.py
>>> new file mode 100644
>>> index 00000000..4069275c
>>> --- /dev/null
>>> +++ b/lib/py/strip.py
> The indentation of this file is using spaces, (IMHO we should be using 
> space for python code) however, the rest of the code base is using tabs. 
> For consistency it might be better to use tabs in this file as well?

I've been told a lot that true pythonic indentation is with spaces, not tabs.
That said -- should we take sed and re-format the whole py stuff into spaces?

>>> @@ -0,0 +1,66 @@
>>> +# This file contains methods to deal with anonymising images.
>>> +#
>>> +# Contents being anonymised can be found at: https://github.com/checkpoint-restore/criu/issues/360
> Could you please add the content that is being anonymised instead of 
> providing an external link to the github issue? This will be helpful 
> when reading the source code offline.
>>> +#
>>> +# Inorder to anonymise the image files three steps are followed:
> s/Inorder/In order/g
>>> +#    - decode the binary image to json
>>> +#    - strip the necessary information from the json dict
>>> +#    - encode the json dict back to a binary image, which is now anonymised
>>> +
>>> +import sys
>>> +import json
>>> +import random
>>> +
>>> +def files_anon(image):
>>> +    levels = {}
>>> +
>>> +    for e in image['entries']:
>>> +        f_path = e['reg']['name']
> we should handle KeyError: 'reg' or check if the reg key exists.
>>> +        f_path = f_path.split('/')
>>> +
>>> +        lev_num = 0
>>> +        for p in f_path:
>>> +            if p == '':
>>> +                continue
>>> +            if lev_num in levels.keys():
>>> +                if p not in levels[lev_num].keys():
> is .keys() necessary here?
>>> +                    temp = list(p)
>>> +                    random.shuffle(temp)
>> Erm, I'm not 100% it's OK to anonymize file paths like that.
> Computing a hash could be another option?

Yes.

I was also thinking on checking the 1st level to be one of "known" names like
var, home, usr, etc, etc ( ;) ) and not shuffling them.

-- Pavel



More information about the CRIU mailing list