[Devel] Re: [RFC][PATCH] clone_with_pids()^w eclone() for x86_64
Louis Rilling
Louis.Rilling at kerlabs.com
Thu Nov 19 13:29:05 PST 2009
On Thu, Nov 19, 2009 at 10:26:46PM +0100, Louis Rilling wrote:
> On Thu, Nov 19, 2009 at 09:48:49AM -0800, Dave Hansen wrote:
> > On Thu, 2009-11-19 at 10:58 +0100, Louis Rilling wrote:
> > > > int clone_with_pids(long flags_low, struct clone_args *clone_args, long args_size,
> > > > int *pids)
> > > > {
> > > > long retval;
> > > >
> > > > __asm__ __volatile__(
> > > > "movq %3, %%r10\n\t" /* pids in r10*/
> > > > "pushq %%rbp\n\t" /* save value of ebp */
> > > > :
> > > > :"D" (flags_low), /* rdi */
> > > > "S" (clone_args),/* rsi */
> > > > "d" (args_size), /* rdx */
> > > > "a" (pids) /* use rax, which gets moved to r10 */
> > > > );
> > >
> > > 1. The fourth C arg is not in rax, but in rcx.
> >
> > Hey Louis,
> >
> > So, try as I might, I couldn't get that to work. I thought it was rcx,
> > too.
> >
> > So, changing that instruction to:
> >
> > "movq %3, %%rcx\n\t" /* pids in r10*/
>
> Hm, no.
>
> I meant (without taking into account my other comments):
>
> __asm__ __volatile__(
> "movq %3, %%r10\n\t" /* pids in r10*/
> "pushq %%rbp\n\t" /* save value of ebp */
> :
> :"D" (flags_low), /* rdi */
> "S" (clone_args),/* rsi */
> "d" (args_size), /* rdx */
> "c" (pids) /* use rcx, which gets moved to r10 */
> );
>
> But actually this is even better :D:
>
> __asm__ __volatile__(
> "movq %3, %%r10\n\t" /* pids in r10*/
> "pushq %%rbp\n\t" /* save value of ebp */
> :
> :"D" (flags_low), /* rdi */
> "S" (clone_args),/* rsi */
> "d" (args_size), /* rdx */
> "r10" (pids) /* Linux reads its fourth arg from r10 */
> );
>
Grrrr, I sent the email too quickly! This is the better version:
__asm__ __volatile__(
"pushq %%rbp\n\t" /* save value of ebp */
:
:"D" (flags_low), /* rdi */
"S" (clone_args),/* rsi */
"d" (args_size), /* rdx */
"r10" (pids) /* Linux reads its fourth arg from r10 */
);
Louis
> >
> > and putting 0x11111, etc... in for the args the strace output for the
> > syscall looks like this:
> >
> > syscall_299(0x11111, 0x22222, 0x33333, 0x1, 0x1, 0x2, 0, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0) = -1 (errno 22)
> >
> > and I get -EFAULT back from the function doing the copy_from_user() of
> > the pids argument, even when using good values.
> >
> > If I use the asm posted above, I get this:
> >
> > syscall_299(0x11111, 0x22222, 0x33333, 0x44444, 0x1, 0x2, 0, 0,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0) = -1 (errno 22)
> >
> > Or, this from a real call:
> >
> > syscall_299(0x1100011, 0x7fff19f0fd40, 0x38, 0x602070, 0x1, 0x2,
> > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > 0, 0, 0, 0, 0[2992, 377]: Child:
> >
> > I had to find r10 basically by trial and error. I have no idea why it
> > works.
>
> r10 is used to pass the fourth arg to the kernel because the syscall instruction
> puts next rip (return address) in rcx. Using r10 instead of rcx is defined as part
> of Linux ABI for x86_64.
>
> For all the details, read the comments in
> arch/x86/kernel/entry_64.S:ENTRY(system_call).
>
> >
> > > >
> > > > __asm__ __volatile__(
> > > > "syscall\n\t" /* Linux/x86_64 system call */
> > > > "testq %0,%0\n\t" /* check return value */
> > > > "jne 1f\n\t" /* jump if parent */
> > > > "popq %%rbx\n\t" /* get subthread function */
> > > > "call *%%rbx\n\t" /* start subthread function */
> > > > "movq %2,%0\n\t"
> > > > "syscall\n" /* exit system call: exit subthread */
> > > > "1:\n\t"
> > > > "popq %%rbp\t" /* restore parent's ebp */
> > > > :"=a" (retval)
> > > > :"0" (__NR_clone3), "i" (__NR_exit)
> > > > :"ebx", "ecx", "edx"
> > > > );
> > >
> > > 2. You should probably not separate this into two asm statements. In particular,
> > > the compiler has no way to know that r10 should be preserved between the two
> > > statements, and may be confused by the change of rsp.
> >
> > Yeah, I wondered about that. Suka, we should probably fix your tests
> > and the i386 code, too.
> >
> > > 3. r10 and r11 should be listed as clobbered.
> >
> > D'oh! I didn't even touch the bottom registers because it continued to
> > work from the i386 version that I stole from Suka.
>
> That's again because of the syscall instruction, which saves EFLAGS to r11
> (and sysret restores EFLAGS from r11).
>
> >
> > > 4. I fail to see the magic that puts the subthread function pointer in the
> > > stack.
> > >
> > > 5. Maybe rdi should contain the subthread argument before calling the subthread?
> > >
> > > 6. rdi, rsi, rdx, rcx, r8 and r9 should be added to the clobber list because of
> > > the call to the subthread function.
> > >
> > > 7. rsi could be used in place of rbx to hold the function pointer, which would
> > > allow you to remove ebx from the clobber list.
> > >
> > > 8. I don't see why rbp should be saved. The ABI says it must be saved by the
> > > callee.
> > >
> > > 9. Before calling exit(), maybe put some exit code in rdi?
> >
> > Thanks for looking through this, Louis. I'll send out another version
> > in a bit.
>
> Thanks,
>
> Louis
>
> --
> Dr Louis Rilling Kerlabs
> Skype: louis.rilling Batiment Germanium
> Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
> http://www.kerlabs.com/ 35700 Rennes
--
Dr Louis Rilling Kerlabs
Skype: louis.rilling Batiment Germanium
Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes
http://www.kerlabs.com/ 35700 Rennes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.openvz.org/pipermail/devel/attachments/20091119/54dbb194/attachment-0001.sig>
-------------- next part --------------
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list