[Devel] Re: [C/R ARM][PATCH 2/3] ARM: Add the eclone system call
Sukadev Bhattiprolu
sukadev at linux.vnet.ibm.com
Wed Mar 24 11:19:07 PDT 2010
Russell King - ARM Linux [linux at arm.linux.org.uk] wrote:
| > +asmlinkage int sys_eclone(unsigned flags_low, struct clone_args __user *uca,
| > + int args_size, pid_t __user *pids,
| > + struct pt_regs *regs)
| > +{
| > + int rc;
| > + struct clone_args kca;
| > + unsigned long flags;
| > + int __user *parent_tidp;
| > + int __user *child_tidp;
| > + unsigned long __user stack;
|
| __user on an integer type doesn't make any sense; integer types do not
| have address spaces.
Ah, will fix that for x86 32/64 bit implementations.
|
| > + unsigned long stack_size;
| > +
| > + rc = fetch_clone_args_from_user(uca, args_size, &kca);
| > + if (rc)
| > + return rc;
| > +
| > + /*
| > + * TODO: Convert 'clone-flags' to 64-bits on all architectures.
| > + * TODO: When ->clone_flags_high is non-zero, copy it in to the
| > + * higher word(s) of 'flags':
| > + *
| > + * flags = (kca.clone_flags_high << 32) | flags_low;
| > + */
| > + flags = flags_low;
| > + parent_tidp = (int *)(unsigned long)kca.parent_tid_ptr;
| > + child_tidp = (int *)(unsigned long)kca.child_tid_ptr;
|
| This will produce sparse errors. Is there a reason why 'clone_args'
| tid pointers aren't already pointers marked with __user ?
Making them pointers would make them 32-bit on some and 64-bit on other
architectures. We wanted the fields to be the same size on all architectures
for easier portability/extensibility. Given that we are copying-in a structure
from user-space, copying in a few extra bytes on the 32-bit architectures
would not be significant.
|
| > +
| > + stack_size = (unsigned long)kca.child_stack_size;
|
| Shouldn't this already be of integer type?
|
| > + if (stack_size)
| > + return -EINVAL;
|
| So the stack must have a zero size? Is this missing a '!' ?
Some architectures (IA64 ?) use the stack-size field. Those that don't
need it should error out. Again, its because we are trying to keep the
interface common across architectures.
|
| > +
| > + stack = (unsigned long)kca.child_stack;
| > + if (!stack)
| > + stack = regs->ARM_sp;
| > +
| > + return do_fork_with_pids(flags, stack, regs, stack_size, parent_tidp,
| > + child_tidp, kca.nr_pids, pids);
|
| Hmm, so let me get this syscall interface right. We have some arguments
| passed in registers and others via a (variable sized?) structure. It seems
| really weird to have, eg, a pointer to the pids and the number of pids
| passed in two separate ways.
|
| The grouping between what's passed in registers and via this clone_args
| structure seems to be random. Can it be sanitized?
:-) Well, we went through a lot of discussions on this. Here is one pointer
http://lkml.org/lkml/2009/9/11/92 (and that was version 6 of 13+ :-).
We wanted the first parameter of eclone(), flags, to remain 32-bit value to
avoid confusing user-space. (i.e if they accidentally pass a 64-bit flags
value to the clone() system call which takes 32-bit flags, the higher-32
bits would silently be dropped).
By sticking the higher clone-flags in 'struct clone_args', all architectures
would have to do the same extra work to set the higher flags so less
chance of error.
Re: passing the number of pids, nr_pids in the structure, I think it was
just that by passing it in the structure, we could avoid using an extra
register for the system call parameter.
'pid_t *' would be of different size on different architecutres. Passing
it as a separate parameter would avoid the pointer conversion. Note
that unlike the tid pointers above which are a few individual fields,
the pid_t array could in theory be large.
Hope that helps. Thanks the review comments.
Sukadev
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list