[CRIU] [PATCH 3/7] seccomp: add support for SECCOMP_MODE_FILTER

Fri Nov 13 22:50:40 PST 2015

On 11/14/2015 12:58 AM, Tycho Andersen wrote:
> On Thu, Nov 12, 2015 at 10:58:22AM +0300, Pavel Emelyanov wrote:
>>
>>>>> +			continue;
>>>>> +		}
>>>>> +
>>>>> +		info = xmalloc(sizeof(*info));
>>>>> +		if (!info)
>>>>> +			goto out;
>>>>> +		seccomp_filter__init(&info->filter);
>>>>> +
>>>>> +		info->filter.filter.len = len * sizeof(struct sock_filter);
>>>>> +		info->filter.filter.data = xmalloc(info->filter.filter.len);
>>>>> +		if (!info->filter.filter.data)
>>>>> +			goto out;
>>>>> +
>>>>> +		memcpy(info->filter.filter.data, buf, info->filter.filter.len);
>>>>> +
>>>>> +		info->prev = infos;
>>>>> +		infos = info;
>>>>> +	}
>>>>> +
>>>>> +save_infos:
>>>>> +	info_count = i;
>>>>> +
>>>>> +	m = xrealloc(filters, sizeof(*filters) * (next_filter_id + info_count));
>>>>
>>>> Why do we allocate that many memory? Part of the filters where "inherited".
>>>> Can you describe what kind of data structure is built behind the filters
>>>> pointer? It's some tree, but I don't get how it exactly looks and how pstree
>>>> items point inside it.
>>>
>>> Yeah, this is a little strange. There are two sets of pointers to the
>>> filters. The first is the "tree" style from pstree, used for resolving
>>> inheritance relationships. The second set is this one, which is just a
>>> big array with everything. This is used for dumping (and freeing) the
>>> actual filters.
>>>
>>> We could ditch the tree style ones from pstree and just use the big
>>> array (and keep array ids in pstree), but it seemed a little nicer to
>>> me this way. But I could be convinced otherwise :)
>>
>> No no, keeping the tree relations inside the filters image and data structures
>> is nice. But as far as the "big array" thing is concerned --
>>
>>> --- /dev/null
>>> +++ b/protobuf/seccomp.proto
>>> @@ -0,0 +1,8 @@
>>> +message seccomp_filter {
>>> +	required bytes		filter			= 1;
>>> +	optional uint32		prev			= 2;
>>> +}
>>> +
>>> +message seccomp_entry {
>>> +	repeated seccomp_filter		seccomp_filters	= 1;
>>> +}
>>
>> why do we need the seccomp_entry? Why not just put seccomp_filter entries
>> into the image file one by one like it's done with pstree_entry-s?
> 
> You're right that we don't, I can rework it to get rid of that. We
> still need the "big array" just to be able to free things (since
> multiple pstrees can point to the same seccomp_filter and there is no
> way of marking them "freed").

Ah, I see. Yes, reading one object is simpler, than doing it in a loop.
OK, let's leave it this way.

>>>>> +int seccomp_filters_get_rst_pos(CoreEntry *core, int *count, unsigned long *pos)
>>>>> +{
>>>>> +	SeccompFilter *sf = NULL;
>>>>> +	struct sock_fprog *arr = NULL;
>>>>> +	int ret = -1, i;
>>>>> +
>>>>> +	if (!core->tc->has_seccomp_filter) {
>>>>> +		*count = 0;
>>>>> +		return 0;
>>>>> +	}
>>>>> +
>>>>> +	*count = 0;
>>>>> +	*pos = rst_mem_cpos(RM_PRIVATE);
>>>>> +
>>>>> +	BUG_ON(core->tc->seccomp_filter > se->n_seccomp_filters);
>>>>> +	sf = se->seccomp_filters[core->tc->seccomp_filter];
>>>>> +
>>>>> +	while (1) {
>>>>> +		(*count)++;
>>>>> +
>>>>> +		if (!sf->has_prev)
>>>>> +			break;
>>>>> +
>>>>> +		sf = se->seccomp_filters[sf->prev];
>>>>> +	}
>>>>> +
>>>>> +	arr = rst_mem_alloc(sizeof(*arr) * (*count), RM_PRIVATE);
>>>>> +	if (!arr)
>>>>> +		goto out;
>>>>> +
>>>>> +	sf = se->seccomp_filters[core->tc->seccomp_filter];
>>>>> +	for (i = 0; i < *count; i++) {
>>>>> +		struct sock_fprog *fprog = &arr[i];
>>>>> +		// XXX: this is a bit fugly; is there a better way?
>>>>
>>>> I didn't get a better way for what you were looking for :)
>>>
>>> The problem here is that we want to send a struct sock_fprog to the
>>> restorer blob, which itself has a pointer to the actual instruction
>>> array. So we want to pass a double pointer to the restorer blob. I
>>> could do something like pack the second array onto the end of the
>>> struct sock_fprog,
>>
>> But you do it already, don't you? After you've allocated the arr thing
>> in which sock_fprog-s are stored, you allocate more memory to the
>> fprog->filter and put the fprog there.
>>
>>> but it is nice to be able to refer to the first
>>> array without having to do math by hand.
>>>
>>> The solution here is to just save the rst_mem_alloc value into the
>>> struct where the second level pointer goes, and then remap the result
>>> after we remap the struct. That feels sort of broken, though :). But
>>> maybe not.
>>
>> Ah I see :) I can only one thing that comes to my mind, as you told above,
>> in the seccomp_filters_get_rst_pos in fprog->filter you can save offset
>> to fprog, omit the loop with remap_ptr-s in sigreturn_restore() and do
>> the "math by hand" in pie's restore_seccomp. This would save us one for()
>> loop and would keep all the logic in less code.
> 
> Less code here, but more code in the restorer blob where we have to do
> a bunch of pointer math to figure out where things live. I could be
> convinced that the extra pointer math was worth it, but I like this a
> little better. Anyway, up to you :)

I can be mistaken, but the extra math in restorer blob shouldn't be
longer then fprog = filter_mem + offset. Other than this, this extra
math will be in the same for () loop as the syscall setting it is,
so we save one loop. Thus can you please move this math into pie? :)

-- Pavel