[CRIU] [PATCH] [RFC] criu: test different situations when parasite must cure itself

Mon Oct 27 00:45:10 PDT 2014

On Mon, Oct 20, 2014 at 07:03:43PM +0400, Pavel Emelyanov wrote:
> On 10/20/2014 09:43 AM, Andrew Vagin wrote:
> > On Fri, Oct 17, 2014 at 05:21:54PM +0400, Pavel Emelyanov wrote:
> >> On 10/15/2014 03:24 PM, Andrey Vagin wrote:
> >>> Here is a simple fault-injection engine. Each fault has uniq code.
> >>> One of this code can be set to the CRIU_FAULT environment variable.
> >>> On the next run this code will be triggered.
> >>> For each fault we need to have code which emulate a specified behaviour.
> >>>
> >>> This patch checks following cases:
> >>> * a parasite socket was closed unexpectedly
> >>> * How parasite handles unsupported command
> >>> * something failed when a parasite daemon is rinning.
> >>> * criu dies unexpectedly
> >>>
> >>> Fault-injection code is compiled only if make is executed with DEBUG=1.
> >>>
> >>> The following command can be used to check all existing fault cases:
> >>> make -C test fault-injection
> >>
> >> We have a systemtap-based fault injection. Why is this version better?
> > 
> > It's much simpler.
> > It tests more cases.
> 
> Questionable. With systemtap we could fail at any point of criu,
> with this -- only where your code is placed.

With systemtap we can't fail at any point. For example systemtap doesn't
allow to return from a function at a specified place with a specified
code.

> 
> > Systemtap requires kernel-debug and loading kernel modules.
> > The systemtap version failed sometimes. I don't remember a reason, but
> > it isn't about criu.
> 
> I see. I'm not extremely happy with 1 explicit failure point. Can we
> come up with some more generic failure-injection rather than a single
> arbitrary chosen point? CRIU can crash at any place (in theory).

We can. And it's another test. We need to take into account many things
to write it. This technique is called "fuzz testing".

Here is a regression test, which does specified checks. The difference
between these tests are similar with the difference between maps007 and
other mapsXX tests.

This patch doesn't check only a crash case. It checks error paths and a
wrong parasite command. We can add hooks to check other paths. And this
test works very fast and its failures are easy to investigate.

It's always good to have both types of tests.