[CRIU] [PATCH] zdtm: check that a command completes successfully after a fault (v2)

Tue Mar 1 07:50:36 PST 2016

On Tue, Mar 01, 2016 at 05:53:17PM +0300, Pavel Emelyanov wrote:
> On 03/01/2016 05:23 PM, Andrey Vagin wrote:
> > 2016-03-01 2:01 GMT-08:00 Pavel Emelyanov <xemul at virtuozzo.com>:
> >> On 03/01/2016 03:04 AM, Andrey Vagin wrote:
> >>> From: Andrew Vagin <avagin at virtuozzo.com>
> >>>
> >>> I suggest to inject a fault and than try to execute the same command
> >>> again without a fault to check that it will complete successfully.
> >>>
> >>> v2: skip a parasite blob when we are checking vma-s
> >>> Signed-off-by: Andrew Vagin <avagin at virtuozzo.com>
> >>> ---
> >>>  test/zdtm.py | 40 +++++++++++++++++++++++++++++-----------
> >>>  1 file changed, 29 insertions(+), 11 deletions(-)
> >>>
> >>> diff --git a/test/zdtm.py b/test/zdtm.py
> >>> index 1ace919..27fa8d4 100755
> >>> --- a/test/zdtm.py
> >>> +++ b/test/zdtm.py
> >>> @@ -656,13 +656,31 @@ class criu_cli:
> >>>
> >>>               preexec = self.__user and self.set_user_id or None
> >>>
> >>> -             ret = self.__criu(action, s_args, self.__fault, strace, preexec)
> >>> -             grep_errors(os.path.join(self.__ddir(), log))
> >>> -             if ret != 0:
> >>> -                     if self.__fault or self.__test.blocking() or (self.__sat and action == 'restore'):
> >>> -                             raise test_fail_expected_exc(action)
> >>> -                     else:
> >>> -                             raise test_fail_exc("CRIU %s" % action)
> >>> +             faults = [ self.__fault ]
> >>> +             # try again after the first failed case
> >>> +             if self.__fault:
> >>> +                     faults.append(None)
> >>> +             for fault in faults:
> >>> +                     __ddir = self.__ddir()
> >>> +
> >>> +                     ret = self.__criu(action, s_args, fault, strace, preexec)
> >>> +                     grep_errors(os.path.join(__ddir, log))
> >>> +                     if ret != 0:
> >>> +                             if fault:
> >>> +                                     try_run_hook(self.__test, ["--fault", action])
> >>> +                                     if action == "dump":
> >>> +                                             __ddir_fail = __ddir + ".fail"
> >>> +                                             os.rename(__ddir, __ddir + ".fail")
> >>> +                                             os.mkdir(__ddir)
> >>> +                                             os.chmod(__ddir, 0777)
> >>> +                                     else:
> >>> +                                             os.rename(os.path.join(__ddir, log), os.path.join(__ddir, log + ".fail"))
> >>
> >> What does this dir manipulation do?
> > 
> > On dump this directory will contain a part of images, so we move the
> > whole directory.
> > On restore we move only a log file.
> 
> Move where and what for? There was no log file moving in this place.

Rename into DIRNAME.fail. It's to avoid situation when we have images
from a previous run. We have a few optional images and if we execute
dump in a second time, we can get mix of images from the frist and
second runs.

We need to move a log file to save it for future investigations. We
need to rename a log file, because we don't know when a fault will be
injected.

> 
> -- Pavel