[CRIU] [PATCH 1/3] seccomp: add initial support for SECCOMP_MODE_STRICT

Tue Jun 2 06:46:32 PDT 2015

On 06/02/2015 04:38 PM, Tycho Andersen wrote:
> On Tue, Jun 02, 2015 at 03:40:24PM +0300, Pavel Emelyanov wrote:
>> Cool :) And can you elaborate on why exactly the filtered mode cannot be
>> supported?
> 
> Yes, simply because we can't read the filters out of the kernel
> (they're not exposed to userland at all). I've mailed the seccomp
> maintainer about how to expose these, but he just had a baby and so is
> not very responsive right now. Hopefully I'll hear back about how to
> do this soon :)

Ah, I see.

>>> @@ -756,6 +758,36 @@ static int collect_helper_pids()
>>>  	return 0;
>>>  }
>>>  
>>> +/*
>>> + * Prepare the seccomp state for the process. Note that seccomp is restored in
>>> + * suspended mode, and then unsuspended after the restorer blob completes in
>>> + * finalize_restore().
>>> + */
>>
>> I'm not sure I get this. The prctl below will just turn the unsuspended strict mode
>> on. The ptrace-s in fork_with_pid() will try to suspend one, but this looks racy.
>> What prevents the prctl from being called _before_ the mentioned ptrace-s?
> 
> The ptraces in fork_with_pid happen during CR_STATE_FORKING (this
> state is finished in restore_task_with_children, but after
> create_children_and_session is called, which itself does more forking
> and ptracing in fork_with_pid), but the prctl here happen in
> CR_STATE_RESTORE. I think it should be ok.

Ah, so the ptrace suspend of seccomp works even if seccomp itself is not
turned on. OK :)

>>> @@ -1135,13 +1170,57 @@ static inline int fork_with_pid(struct pstree_item *item)
>>>  		goto err_unlock;
>>>  	}
>>>  
>>> -
>>>  	if (item == root_item) {
>>>  		item->pid.real = ret;
>>>  		pr_debug("PID: real %d virt %d\n",
>>>  				item->pid.real, item->pid.virt);
>>>  	}
>>>  
>>> +	if (ca.core->tc->seccomp_mode != SECCOMP_MODE_DISABLED) {
>>> +#ifdef CONFIG_HAS_SUSPEND_SECCOMP
>>
>> Can we avoid $ifdef-s in the middle of the code, please?
> 
> Sure, what's the usual pattern for doing this without them? I could
> look for EIO from PTRACE_SUSPEND_SECCOMP (indicating a bad ptrace
> command), unless you have some better way in mind?

Typically this is done like this

#ifdef FOO
static void foo(void)
{
	/* useful code here */
}
#else
static void foo(void)
{
	/* relevant stub */
}
#endif

...

static void bar(void)
{
	...
	foo();
	...
}

This way is also nice because foo() can be put into separate .c
file with some descriptive name.

>>> +		int status;
>>> +
>>> +		/*
>>> +		 * Since the process is going to restore seccomp state, we
>>> +		 * need to temporarily suspend seccomp before we restore this
>>> +		 * state, otherwise subsequent restore syscalls might be
>>> +		 * blocked. seccomp is resumed at the end in finalize_restore()
>>> +		 */
>>> +		if (ptrace(PTRACE_ATTACH, ret, NULL, NULL) < 0) {
>>> +			pr_perror("ptrace attach failed");
>>> +			kill(ret, SIGKILL);
>>> +			goto err_unlock;
>>> +		}
>>> +
>>> +		if (waitpid(ret, &status, 0) < 0) {
>>> +			pr_perror("waidpid failed");
>>> +			kill(ret, SIGKILL);
>>> +			goto err_unlock;
>>> +		}
>>> +
>>> +		if (!WIFSTOPPED(status)) {
>>> +			pr_err("not stopped after ptrace attach?\n");
>>> +			kill(ret, SIGKILL);
>>> +			goto err_unlock;
>>> +		}
>>> +
>>> +		if (ptrace(PTRACE_SUSPEND_SECCOMP, ret, NULL, 1) < 0) {
>>> +			pr_perror("seccomp suspend failed %d", ret);
>>> +			kill(ret, SIGKILL);
>>> +			goto err_unlock;
>>> +		}
>>> +
>>> +		if (ptrace(PTRACE_DETACH, ret, NULL, NULL) < 0) {
>>> +			pr_perror("ptrace detach failed");
>>> +			kill(ret, SIGKILL);
>>> +			goto err_unlock;
>>> +		}

Oh, btw, once you implement suspend-off on ptrace detach this place will
stop working.

-- Pavel