[CRIU] [PATCH] test, unix: Exhaustive testing of states

Pavel Emelyanov xemul at virtuozzo.com
Fri Dec 2 08:13:17 PST 2016


By exhaustive testing I understand a test suite that generates as much
states to try to C/R as possible by trying all the possible sequences
of system calls. Since such a generation, if done on all the Linux API
we support in CRIU, would produce bazillions of process, I propose to
start with something simple.

As a starting point -- unix stream sockets with abstract names that
can be created and used by a single process :)

The script generates situations in which unix sockets can get into by
using a pre-defined set of system calls. In this patch the syscalls
are socket, listen, bind, accept, connect and send. Also the nummber
of system calls to use (i.e. -- the depth of the tree) is limited by
the --depth option.

There are three things that can be done with a generated 'state':

I) Generate :) and show

Generation is done by recursively doing everything that is possible
(and makes sence) in a given state. To reduce the size of the tree
some meaningless branches are cut, e.g. creating a socket and closing
it right after that, creating two similar sockets one-by-one and some
more.

Shown on the screen is a cryptic string, e.g. 'SA-CX-MX_SBL one,
describing the sockets in the state. This is how it can be decoded:

 - sockets are delimited with _
 - first goes type (S -- stream, D --datagram)
 - next goes name state (A -- no name, B with name, X socket is not in
   FD table, i.e. closed or not yet accepted)
 - next may go letter L meaning that the socket is listening
 - -Cx -- socket is connected and x is the peer's name state
 - -Ixyz -- socket has incoming connections queue and xyz are the
   connect()-ors name states
 - -Mxyz -- socket has messages and xyz is senders' name states

The example above means, that we have two sockets:

 - SA-CX-MX: stream, with no name, connected to a dead one and with a
   message from a dead one
 - SBL: stream, with name, listening

Next printed is the sequence of system calls to get into it, e.g. this
is how to get into the state above:

	socket(S) = 1
	bind(1, $name-1)
	listen(1)
	socket(S) = 2
	connect(2, $name-1)
	accept(1) = 3
	send(2, $message-0)
	send(3, $message-0)
	close(3)

Program has created a stream socket, bound it, listened it, then
created another stream socket, connected to the 1st one, then accepted
the connection sent two messages vice-versa and closed the accepted
end, so the 1st socket left connected to the dead socket with a
message from it.


II) Run the state

This is when test actually creates a process that does the syscalls
required to get into the generated state (and hopefully gets into it).


III) Check C/R of the state

This is the trickiest part when it comes to the R step -- it's not
clear how to validate that the state restored is correct. But if only
trying to dump the state -- it's just calling criu dump. As images dir
the state string description is used.


One may choose only to generate the states with --gen option. One may
choose only to run the states with --run option. The latter is useful
to verify that the states generator is actually producing valid
states. If no options given, the state is also dump-ed (restore is to
come later).


For now the usage experience is like this:

- Going --depth 10 --gen (i.e. just generating all possibles states
  that are acheivable with 10 syscalls) produces 44 unique states for
  0.01 seconds. The generated result covers some static tests we have
  in zdtm :)  More generation stats is like this:
   --depth 15 : 1.1 sec   / 72 states
   --depth 18 : 13.2 sec  / 89 states
   --depth 20 : 1 m 8 sec / 101 state

- Running and trying with criu is checked with --depth 9. Criu fails
  to dump the state SA-CX-MX_SBL (shown above) with the error

  Error (criu/sk-queue.c:151): recvmsg fail: error: Connection reset by peer


Nearest plans:

1. Add generators for on-disk sockets names (now oly abstract).
   Here an interesting case is when names overlap and one socket gets
   a name of another, but isn't accessible by it

2. Add datagram sockets.
   Here it'd be fun to look at how many-to-one connections are
   generated and checked.

3. Add socketpair()-s.


Farther plans:

1. Cut the tree better to allow for deeper tree scan.

2. Add restore.

3. Add SCM-s

4. Have the exhaustive testing for other resources.


Signed-off-byr Pavel Emelyanov <xemul at virtuozzo.com>
---
 test/exhaustive/unix.py | 440 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 440 insertions(+)
 create mode 100755 test/exhaustive/unix.py

diff --git a/test/exhaustive/unix.py b/test/exhaustive/unix.py
new file mode 100755
index 0000000..504b1f0
--- /dev/null
+++ b/test/exhaustive/unix.py
@@ -0,0 +1,440 @@
+#!/usr/bin/env python
+
+import sys
+import os
+import socket
+import argparse
+import time
+import subprocess
+import signal
+
+criu_bin='../../criu/criu'
+
+sk_type_s = {
+	socket.SOCK_STREAM:    "S",
+	socket.SOCK_DGRAM:     "D",
+}
+
+# Actions that can be done by test. Actions are not only syscall
+# names to call, but also arguments with which to do it
+#
+# Each action consists of
+# - arguments, e.g. type of socket, or socket id to work on
+# - act() method which just generates an record
+# - do() method, that actually does what's required
+# - show() method to return the string description of what's done
+
+def mk_socket(st, typ):
+	st.sk_id += 1
+	sk = sock(st.sk_id, typ)
+	st.add_socket(sk)
+	return sk
+
+class act_socket:
+	def __init__(self, typ):
+		self.typ = typ
+
+	def act(self, st):
+		sk = mk_socket(st, self.typ)
+		self.sk_id = sk.sk_id
+
+	def do(self, st):
+		sk = socket.socket(socket.AF_UNIX, self.typ, 0)
+		st.real_sockets[self.sk_id] = sk
+
+	def show(self):
+		return 'socket(%s) = %d' % (sk_type_s[self.typ], self.sk_id)
+
+
+class act_close:
+	def __init__(self, sk_id):
+		self.sk_id = sk_id
+
+	def act(self, st):
+		st.del_socket(self.sk_id)
+
+	def do(self, st):
+		sk = st.real_sockets.pop(self.sk_id)
+		sk.close()
+
+	def show(self):
+		return 'close(%d)' % self.sk_id
+
+
+class act_listen:
+	def __init__(self, sk_id):
+		self.sk_id = sk_id
+
+	def act(self, st):
+		sk = st.get_socket(self.sk_id)
+		sk.listen = True
+
+	def do(self, st):
+		sk = st.real_sockets[self.sk_id]
+		sk.listen(10)
+
+	def show(self):
+		return 'listen(%d)' % self.sk_id
+
+
+class act_bind:
+	def __init__(self, sk_id, name_id):
+		self.sk_id = sk_id
+		self.name_id = name_id
+
+	def act(self, st):
+		sk = st.get_socket(self.sk_id)
+		sk.name = self.name_id
+
+	def do(self, st):
+		sk = st.real_sockets[self.sk_id]
+		sk.bind(sock.real_name_for(self.name_id))
+
+	def show(self):
+		return 'bind(%d, $name-%d)' % (self.sk_id, self.name_id)
+
+
+class act_connect:
+	def __init__(self, sk_id, listen_sk_id):
+		self.sk_id = sk_id
+		self.lsk_id = listen_sk_id
+
+	def act(self, st):
+		sk = st.get_socket(self.sk_id)
+		lsk = st.get_socket(self.lsk_id)
+		psk = mk_socket(st, socket.SOCK_STREAM)
+		psk.visible = False
+		sk.peer = psk.sk_id
+		psk.peer = sk.sk_id
+		psk.name = lsk.name
+		lsk.icons.append(psk.sk_id)
+		lsk.icons_seq += 1
+
+	def do(self, st):
+		sk = st.real_sockets[self.sk_id]
+		sk.connect(sock.real_name_for(self.lsk_id))
+
+	def show(self):
+		return 'connect(%d, $name-%d)' % (self.sk_id, self.lsk_id)
+
+
+class act_accept:
+	def __init__(self, sk_id):
+		self.sk_id = sk_id
+
+	def act(self, st):
+		lsk = st.get_socket(self.sk_id)
+		iid = lsk.icons.pop(0)
+		nsk = st.get_socket(iid)
+		nsk.visible = True
+		self.nsk_id = nsk.sk_id
+
+	def do(self, st):
+		sk = st.real_sockets[self.sk_id]
+		nsk, ai = sk.accept()
+		if self.nsk_id in st.real_sockets:
+			raise Exception("SK ID conflict")
+		st.real_sockets[self.nsk_id] = nsk
+
+	def show(self):
+		return 'accept(%d) = %d' % (self.sk_id, self.nsk_id)
+
+
+class act_sendmsg:
+	def __init__(self, sk_id):
+		self.sk_id = sk_id
+
+	def act(self, st):
+		sk = st.get_socket(self.sk_id)
+		msg = (sk.sk_id, sk.outseq)
+		self.msg_id = sk.outseq
+		sk.outseq += 1
+		psk = st.get_socket(sk.peer)
+		psk.inqueue.append(msg)
+
+	def do(self, st):
+		sk = st.real_sockets[self.sk_id]
+		sk.send('MSG%d' % self.msg_id)
+
+	def show(self):
+		return 'send(%d, $message-%d)' % (self.sk_id, self.msg_id)
+
+#
+# Description of a socket
+#
+class sock:
+	def __init__(self, sk_id, sock_type):
+		# ID of a socket. Since states and sockets are cloned
+		# while we scan the tree of states the only valid way
+		# to address a socket is to find one by ID.
+		self.sk_id = sk_id
+		# The socket.SOCK_FOO value
+		self.sk_type = sock_type
+		# Sockets that haven't yet been accept()-ed are in the
+		# state, but user cannot operate on them. Also this
+		# invisibility contributes to state descriptionm since
+		# connection to not accepted socket is not the same
+		# as connection to accepted one.
+		self.visible = True
+		# The listen() was called.
+		self.listen = False
+		# The bind() was called. Also set by accept(), the name
+		# inherits from listener.
+		self.name = None
+		# The connect() was called. Set on two sockets when the
+		# connect() is called.
+		self.peer = None
+		# Progress on accepting connections. Used to check when
+		# it's OK to close the socket (see comment below).
+		self.icons_seq = 0
+		# List of IDs of sockets that can be accept()-ed
+		self.icons = []
+		# Number to generate message contents.
+		self.outseq = 0
+		# Incoming queue of messages.
+		self.inqueue = []
+
+	def clone(self):
+		sk = sock(self.sk_id, self.sk_type)
+		sk.visible = self.visible
+		sk.listen = self.listen
+		sk.name = self.name
+		sk.peer = self.peer
+		sk.icons_seq = self.icons_seq
+		sk.icons = list(self.icons)
+		sk.outseq = self.outseq
+		sk.inqueue = list(self.inqueue)
+		return sk
+
+	def get_actions(self, st):
+		act_list = []
+
+		if not self.visible:
+			return act_list
+
+		# Any socket can be closed, but closing a socket
+		# that hasn't contributed to some new states is
+		# just waste of time, so we close only connected
+		# sockets or listeners that has at least one
+		# incoming connection pendig or served
+
+		if self.listen:
+			if self.icons:
+				act_list.append(act_accept(self.sk_id))
+			if self.icons_seq:
+				act_list.append(act_close(self.sk_id))
+		elif self.peer:
+			act_list.append(act_close(self.sk_id))
+			# Connected sockets can send and receive messages
+			# But receiving seem not to produce any new states,
+			# so only sending
+			# Also sending to a closed socket doesn't work
+			if st.get_socket(self.peer, True):
+				act_list.append(act_sendmsg(self.sk_id))
+		else:
+			for psk in st.sockets:
+				if psk.listen and psk.name:
+					act_list.append(act_connect(self.sk_id, psk.sk_id))
+
+			# Listen on not-bound socket is prohibited as
+			# well as binding a listening socket
+			if not self.name:
+				# TODO: support for file paths (see real_name_for)
+				# TODO: these names can overlap each other
+				act_list.append(act_bind(self.sk_id, self.sk_id))
+			else:
+				act_list.append(act_listen(self.sk_id))
+
+
+		return act_list
+
+	@staticmethod
+	def name_of(sk):
+		if not sk:
+			return 'X'
+		elif not sk.visible:
+			return 'X'
+		elif sk.name:
+			return 'B'
+		else:
+			return 'A'
+
+	@staticmethod
+	def real_name_for(sk_id):
+		return "\0" + "CRSK%d" % sk_id
+
+	# The describe() generates a string that represents
+	# a state of a socket. Called by state.describe(), see
+	# comment there about what description is.
+	def describe(self, st):
+		dsc = '%s' % sk_type_s[self.sk_type]
+		dsc += sock.name_of(self)
+
+		if self.listen:
+			dsc += 'L'
+		if self.peer:
+			psk = st.get_socket(self.peer, True)
+			dsc += '-C%s' % sock.name_of(psk)
+		if self.icons:
+			i_dsc = ''
+			for c in self.icons:
+				psk = st.get_socket(c)
+				psk = st.get_socket(psk.peer, True)
+				i_dsc += sock.name_of(psk)
+			dsc += '-I%s' % i_dsc
+		if self.inqueue:
+			froms = set()
+			for m in self.inqueue:
+				froms.add(m[0])
+			q_dsc = ''
+			for f in froms:
+				fsk = st.get_socket(f, True)
+				q_dsc += sock.name_of(fsk)
+			dsc += '-M%s' % q_dsc
+		return dsc
+
+
+class state:
+	def __init__(self):
+		self.sockets = []
+		self.sk_id = 0
+		self.steps = []
+		self.real_sockets = {}
+
+	def add_socket(self, sk):
+		self.sockets.append(sk)
+
+	def del_socket(self, sk_id):
+		sk = self.get_socket(sk_id)
+		self.sockets.remove(sk)
+
+	def get_socket(self, sk_id, can_be_null = False):
+		for sk in self.sockets:
+			if sk.sk_id == sk_id:
+				return sk
+
+		if not can_be_null:
+			raise Exception("%d socket not in list" % sk_id)
+
+		return None
+
+	def get_actions(self):
+		act_list = []
+
+		# Any socket in the state we can change it
+		for sk in self.sockets:
+			act_list += sk.get_actions(self)
+
+		# If no sockets can change its state let's
+		# cheer up the state by throwing a new socket
+		# into the game
+		if not act_list:
+			act_list.append(act_socket(socket.SOCK_STREAM))
+			# TODO: add DGRAM sockets
+			# act_list.append((act_socket, socket.SOCK_DGRAM))
+
+
+		return act_list
+
+	def clone(self):
+		nst = state()
+		for sk in self.sockets:
+			nst.sockets.append(sk.clone())
+		nst.sk_id = self.sk_id
+		nst.steps = list(self.steps)
+		return nst
+
+	# Generates textual description of a state. Different states
+	# may have same descriptions, e.g. if we have two sockets and
+	# only one of them is in listen state, we don't care which 
+	# one in which. At the same time really different states 
+	# shouldn't map to the same string.
+	def describe(self):
+		sks = filter(lambda x: x.visible, self.sockets)
+		sks = map(lambda x: x.describe(self), sks)
+		sks = sorted(sks)
+		return '_'.join(sks)
+
+
+def chk_state(st, opts):
+	print "Will check state"
+
+	pid = os.fork()
+	if pid == 0:
+		while True:
+			time.sleep(1000)
+		sys.exit(1)
+
+	# Now dump the guy.
+	# FIXME Ideally this call should be performed by the run_state's
+	# pid!=0 branch.
+	img_path = "sti_" + st.describe()
+	os.mkdir(img_path)
+	try:
+		subprocess.check_call([criu_bin, "dump", "-t", "%d" % pid, "-D", img_path, "-v4", "-o", "dump.log", "-j"])
+	except:
+		os.kill(pid, signal.SIGKILL)
+		sys.exit(1)
+
+
+def run_state(st, opts):
+	print "Will run state"
+	pid = os.fork()
+	if pid != 0:
+		wpid, status = os.wait()
+		if os.WIFEXITED(status):
+			status = os.WEXITSTATUS(status)
+		if status != 0:
+			print "` FAIL"
+			sys.exit(1)
+
+		return
+
+	# Try the states in subprocess so that once
+	# it exits the created sockets are removed
+	for step in st.steps:
+		step.do(st)
+
+	if not opts.run:
+		chk_state(st, opts)
+
+	sys.exit(0)
+
+
+def proceed(st, seen, opts, depth = 0):
+	desc = st.describe()
+	if depth != 0 and desc and not desc in seen:
+		# When scanning the tree we run and try only states that
+		# differ, but don't stop tree traversal on them. This is
+		# because sometimes we can get into the already seen state
+		# using less steps and it's better to proceed as we have
+		# depth to move forward and generate more states.
+		seen.add(desc)
+		print '%s' % desc
+		for s in st.steps:
+			print '\t%s' % s.show()
+
+		if not opts.gen:
+			run_state(st, opts)
+
+	if depth >= opts.depth:
+		return
+
+	actions = st.get_actions()
+	for act in actions:
+		nst = st.clone()
+		act.act(nst)
+		nst.steps.append(act)
+		proceed(nst, seen, opts, depth + 1)
+
+
+p = argparse.ArgumentParser("CRIU test suite")
+p.add_argument("--depth", help = "Depth of generated tree", default = '8')
+p.add_argument("--gen", help = "Only generate and show states", action = 'store_true')
+p.add_argument("--run", help = "Run the states, but don't C/R", action = 'store_true')
+opts = p.parse_args()
+opts.depth = int(opts.depth)
+st = state()
+seen = set()
+proceed(st, seen, opts)
+print 'PASS (%d states)' % len(seen)
-- 
2.5.0


More information about the CRIU mailing list