[CRIU] [PATCH 3/3] Issue #360: Anonymize image files

Harshavardhan Unnibhavi hvubfoss at gmail.com
Tue Jun 25 16:39:42 MSK 2019


On Tue, Jun 25, 2019 at 3:22 PM Pavel Emelianov <xemul at virtuozzo.com> wrote:
>
> >>> diff --git a/lib/py/strip.py b/lib/py/strip.py
> >>> new file mode 100644
> >>> index 00000000..4069275c
> >>> --- /dev/null
> >>> +++ b/lib/py/strip.py
> > The indentation of this file is using spaces, (IMHO we should be using
> > space for python code) however, the rest of the code base is using tabs.
> > For consistency it might be better to use tabs in this file as well?
>
> I've been told a lot that true pythonic indentation is with spaces, not tabs.
> That said -- should we take sed and re-format the whole py stuff into spaces?
>
> >>> @@ -0,0 +1,66 @@
> >>> +# This file contains methods to deal with anonymising images.
> >>> +#
> >>> +# Contents being anonymised can be found at: https://github.com/checkpoint-restore/criu/issues/360
> > Could you please add the content that is being anonymised instead of
> > providing an external link to the github issue? This will be helpful
> > when reading the source code offline.
> >>> +#
> >>> +# Inorder to anonymise the image files three steps are followed:
> > s/Inorder/In order/g
> >>> +#    - decode the binary image to json
> >>> +#    - strip the necessary information from the json dict
> >>> +#    - encode the json dict back to a binary image, which is now anonymised
> >>> +
> >>> +import sys
> >>> +import json
> >>> +import random
> >>> +
> >>> +def files_anon(image):
> >>> +    levels = {}
> >>> +
> >>> +    for e in image['entries']:
> >>> +        f_path = e['reg']['name']
> > we should handle KeyError: 'reg' or check if the reg key exists.
> >>> +        f_path = f_path.split('/')
> >>> +
> >>> +        lev_num = 0
> >>> +        for p in f_path:
> >>> +            if p == '':
> >>> +                continue
> >>> +            if lev_num in levels.keys():
> >>> +                if p not in levels[lev_num].keys():
> > is .keys() necessary here?
> >>> +                    temp = list(p)
> >>> +                    random.shuffle(temp)
> >> Erm, I'm not 100% it's OK to anonymize file paths like that.
> > Computing a hash could be another option?
>
> Yes.
>
> I was also thinking on checking the 1st level to be one of "known" names like
> var, home, usr, etc, etc ( ;) ) and not shuffling them.

Sure I can make that change.
>
> -- Pavel

Harsha


More information about the CRIU mailing list