[Devel] Re: [PATCH 1/1] fill vdso with syscall32_setup_pages if TIF_IA32 on x86_64
Oren Laadan
orenl at cs.columbia.edu
Sat Feb 6 07:43:16 PST 2010
On Fri, 5 Feb 2010, Matt Helsley wrote:
> On Fri, Feb 05, 2010 at 08:04:12PM -0500, Oren Laadan wrote:
> >
> >
> > Serge E. Hallyn wrote:
> > > Quoting Oren Laadan (orenl at cs.columbia.edu):
> > >>
> > >> Serge E. Hallyn wrote:
> > >>> Quoting Oren Laadan (orenl at cs.columbia.edu):
> > >>>> Serge E. Hallyn wrote:
> > >>>>> Quoting Oren Laadan (orenl at cs.columbia.edu):
> > >>>>>> Cool !
> > >>>>>>
> > >>>>>> So what do we have working now for 64 bit kernel (for 32 bit kernel
> > >>>>>> we know it works...):
> > >>>>>>
> > >>>>>> 'restart' checkpointed
> > >>>>>> program program
> > >>>>>> ----------------------------------------
> > >>>>>> 64bit 64bit -> works
> > >>>>>> 32bit 32bit -> works
> > >>>>>>
> > >>>>>> 64bit 32bit -> ?????
> > >
> > > s/?????/Rejected/
> > >
> > > CKPT_ARCH_ID is of course different for X86_32 than X86_64, so
> > > we refuse restart in restore_read_header().
> > >
> > > -serge
> > >
> >
> > lol ... that's actually funny !
> >
> > Anyway, in light of the IRC discussions, here are the cases again:
> >
> >
> > original original restart target
> > program kernel program kernel
> > -------- --------- -------- --------
> > 64 bit 64 bit 64 bit 64 bit [0] works
> >
> > 32 bit 32 bit 32 bit 32 bit [0] works
> > 32 bit 64 bit 32 bit 64 bit [0] works
> >
> > 32 bit 32 bit 32 bit 64 bit [1]
> > 32 bit 64 bit 32 bit 32 bit [1]
> >
> > 32 bit any 64 bit 64 bit [2]
> > 64 bit 64 bit 32 bit 64 bit [2]
> >
> > [0] The first 3 cases are "homogeneous", with conditions equal at
> > checkpoint and restart. AFAIK, they work.
> >
> > [1] The next two cases consider 32 bit program, and vary only the
> > environment - the kernel may change from 32 to 64 or back. We want
> > them to work.
> >
> > IIUC, your comment above means that they don't work because the
> > CKPT_ARCH_ID is a mismatch. The fix should be trivial - either
> > make 'restart' modify it, or make the kernel tolerate it.
> >
> > [2] The last two cases consider the case when the restart program
> > itself has different bit-ness than the checkpointed program (and
> > transition may occur in either direction). While lower priority,
> > we would like this to work, too.
>
> Great table. Is it posted in the ckpt wiki too?
>
> http://ckpt.wiki.kernel.org
>
> I could take care of that for you if not. Perhaps it belongs under
> the "Checklist"?
Yep, I'll add it.
> > The question is whether the transition 64 -> 32 (or 32 ->64) from
> > the 'restart' program to the restarting task should happen in the
> > kernel as part of sys_restart(), or in user space using an execve()
> > syscall before calling sys_restart().
>
> The recent exec bug while switching personalities highlights the value,
> in my opinion, of keeping these transitions out of the restart syscall.
> There's great potential for nasty, long-term bugs in any code that
> deals with those kinds of switches. Keeping that code "in one place" is
> the best way to avoid adding similar bugs.
I agree with the concern.
There is even a stronger argument: doing it in user-space via exec
will rid the need to repeat the kernel code for different archs.
The only caveat is that restarting mixes 32- and 64-bit programs
will be slower because fo the exec calls. Then again, it can be
optimized when someone complains (and provides kernel code for this).
>
> > Doing so in user space is not trivial when threads are involved,
> > since the exec must then happen before the creation of threads (or
> > it will kill them). This will complicate the implementation of the
> > MakeForest() algorithm which relies on all all descendents seeing
> > the same data structures.
>
> True -- MakeForest is already rather complicated.
>
> As for seeing the same data structures across exec, perhaps we should
> keep an fd open across exec and read/map the table from that. That means
> converting from struct task* to indices in the table for one thing. I
> have some RFC patches for that. It also means the table contents have to
> use the same layout between 32 and 64-bit -- also quite easy.
>
> What I couldn't see was a good place to do the exec itself.
Can you send me the patches that you already have ?
There are a couple of more details:
* Checkpoint needs to also record the bit-ness of each process in the
tasks table
* On restart, if all tasks have same bit-ness, then one exec at most is
needed.
* Where to place the exec ? ideally for process (without threads)
it would happen right before sys_restart(), but for processes that
have threads it should happen right before the fork().
* Passing an fd is a good idea - anonymous shared memory should do
the trick.
* I'm a bit concerned about security, because 'restart' will likely
be a setuid utility:
1) need to ensure that we are exec'ing the right program (e.g. use
the /proc/self/exe link)
2) a new switch to 'restart' will say "use fd N for the data", but
a user may provide arbitrary data - can they do harm ? we'll
definitely need to be more cautious in handling the tasks array
because in this case we don't construct it ourselves.
Oren.
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list