[CRIU] Re: [PATCH 08/10] syscalls: Complete redesign v4

Pavel Emelyanov xemul at parallels.com
Mon Apr 16 06:57:34 EDT 2012


On 04/15/2012 12:43 AM, Cyrill Gorcunov wrote:
> At early days we've been using only a few syscalls
> which together with debug compiler options always
> produce relative addresses for memory variables used
> in parasite and restorer blobs. Thus it came in unnoticed
> that there is something worng with syscalls declarations
> we use.
> 
> Basically all our syscalls are just a wrappers over inline
> assembly code in form of
> 
> static long syscall2(int nr, long arg0, long arg1)
> {
>         long ret;
>         asm volatile(
>                 "movl %1, %%eax         \t\n"
>                 "movq %2, %%rdi         \t\n"
>                 "movq %3, %%rsi         \t\n"
>                 "syscall                \t\n"
>                 "movq %%rax, %0         \t\n"
>                 : "=r"(ret)
>                 : "g" ((int)nr), "g" (arg0), "g" (arg1)
>                 : "rax", "rdi", "rsi", "memory");
>         return ret;
> }
> 
> so every argument treated to be plain long (even if the call
> sematics implies it's a memory address passed but not some
> integer direct value) and transferred via general purpose
> register.
> 
> As being mentioned it caused no problems when debug options
> specified at compile time, the compiler do not tries to optimize
> addressing but generates code which always compute them.
> 
> The situation is changed if one is building crtools with
> optimization enabled -- the compiler finds that arguments
> are rather plain long numbers and might pass direct addresses
> of variables, instead of generating relative addresses
> (because function declarations have no pointers and 'g' in cope
>  with 'mov' is used, which is of course wrong).
> 
> To fix all this -- now syscalls declarations are generated from
> syscall.def file and function arguments are passed in conform
> with x86-64 ABI.
> 
> This shrinks amount of source code needed to declare syscalls
> and opens a way to use optimization.
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov at openvz.org>

I have 2 concerns about this patch.

1. It mixes cleanups with payload (e.g. fixup args for syscalls or
   move structures/constants to header). Plz, split.
2. The syscall generator is written on perl. What perl is here for?

> ---
>  Makefile                |    9 +-
>  Makefile.inc            |    1 +
>  Makefile.pie            |   18 +-
>  Makefile.syscall        |   37 ++++
>  cr-restore.c            |    2 +-
>  include/syscall-codes.h |   63 -------
>  include/syscall-types.h |   52 ++++++
>  include/syscall.def     |   58 ++++++
>  include/syscall.h       |  444 -----------------------------------------------
>  parasite.c              |   14 +-
>  restorer.c              |   16 +-
>  syscall-common.S        |   16 ++
>  syscalls.pl             |   51 ++++++
>  13 files changed, 245 insertions(+), 536 deletions(-)
>  create mode 100644 Makefile.syscall
>  delete mode 100644 include/syscall-codes.h
>  create mode 100644 include/syscall-types.h
>  create mode 100644 include/syscall.def
>  delete mode 100644 include/syscall.h
>  create mode 100644 syscall-common.S
>  create mode 100644 syscalls.pl
> 
> diff --git a/Makefile b/Makefile
> index dea0745..03532e1 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -48,6 +48,7 @@ OBJS          += ipc_ns.o
> 
>  DEPS           := $(patsubst %.o,%.d,$(OBJS))
> 
> +-include Makefile.syscall
>  -include Makefile.pie
> 
>  all: $(PROGRAM)
> @@ -64,11 +65,11 @@ all: $(PROGRAM)
>         $(E) "  CC      " $@
>         $(Q) $(CC) -S $(CFLAGS) -fverbose-asm $< -o $@
> 
> -$(PROGRAM): $(OBJS) | $(PIE-GEN)
> +$(PROGRAM): $(OBJS) | $(SYS-OBJ) $(PIE-GEN)
>         $(E) "  LINK    " $@
> -       $(Q) $(CC) $(CFLAGS) $(OBJS) $(LIBS) -o $@
> +       $(Q) $(CC) $(CFLAGS) $(OBJS) $(SYS-OBJ) $(LIBS) -o $@
> 
> -%.d: %.c | $(PIE-GEN)
> +%.d: %.c | $(SYS-OBJ) $(PIE-GEN)
>         $(Q) $(CC) -M -MT $(patsubst %.d,%.o,$@) $(CFLAGS) $< -o $@
> 
>  test-legacy: $(PROGRAM)
> @@ -90,7 +91,7 @@ rebuild:
>         $(Q) $(MAKE)
>  .PHONY: rebuild
> 
> -clean: cleanpie
> +clean: cleanpie cleansyscall
>         $(E) "  CLEAN"
>         $(Q) $(RM) -f ./*.o
>         $(Q) $(RM) -f ./*.d
> diff --git a/Makefile.inc b/Makefile.inc
> index ffa02d9..c755a38 100644
> --- a/Makefile.inc
> +++ b/Makefile.inc
> @@ -19,6 +19,7 @@ NM            := nm
>  AWK            := awk
>  SH             := sh
>  MAKE           := make
> +PERL           := perl
> 
>  # Additional ARCH settings for x86
>  ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/ -e s/sun4u/sparc64/ \
> diff --git a/Makefile.pie b/Makefile.pie
> index 5f0076d..8f04dc0 100644
> --- a/Makefile.pie
> +++ b/Makefile.pie
> @@ -26,37 +26,37 @@ DEPS                += $(patsubst %.o,%.d,$(ROBJS))
>  PIEFLAGS       := -fpie
>  ASMFLAGS       := -D__ASSEMBLY__
> 
> -$(PASM-OBJS): $(PASM-SRC)
> +$(PASM-OBJS): $(PASM-SRC) $(SYS-OBJ)
>         $(E) "  CC      " $@
>         $(Q) $(CC) -c $(ASMFLAGS) $(CFLAGS) $(PIEFLAGS) $(patsubst %.o,%.S,$@) -o $@
> 
> -$(POBJS): $(PSRCS) $(PASM-OBJS)
> +$(POBJS): $(PSRCS) $(PASM-OBJS) $(SYS-OBJ)
>         $(E) "  CC      " $@
>         $(Q) $(CC) -c $(CFLAGS) $(PIEFLAGS) $(patsubst %.o,%.c,$@) -o $@
> 
> -parasite-util-net.o: util-net.c
> +parasite-util-net.o: util-net.c $(SYS-OBJ)
>         $(E) "  CC      " $@
>         $(Q) $(CC) -c $(CFLAGS) $(PIEFLAGS) $< -o $@
> 
>  POBJS          += parasite-util-net.o
> 
> -$(PBLOB-BIN): $(PBLOB-LDS) $(POBJS) $(PASM-OBJS)
> +$(PBLOB-BIN): $(PBLOB-LDS) $(POBJS)
>         $(E) "  GEN     " $@
> -       $(Q) $(LD) --oformat=binary -T $(PBLOB-LDS) -o $(PBLOB-BIN) $(POBJS) $(PASM-OBJS)
> -       $(Q) $(LD) --oformat=elf64-x86-64 -T $(PBLOB-LDS) -o $(PBLOB-BIN).o $(POBJS) $(PASM-OBJS)
> +       $(Q) $(LD) --oformat=binary -T $(PBLOB-LDS) -o $(PBLOB-BIN) $(POBJS) $(PASM-OBJS) $(SYS-OBJ)
> +       $(Q) $(LD) --oformat=elf64-x86-64 -T $(PBLOB-LDS) -o $(PBLOB-BIN).o $(POBJS) $(PASM-OBJS) $(SYS-OBJ)
> 
>  $(PBLOB-HDR): $(PBLOB-BIN) $(GEN-OFFSETS)
>         $(E) "  GEN     " $@
>         $(Q) $(SH) $(GEN-OFFSETS) $(PBLOB-NAME) > $@ || rm -f $@
> 
> -$(ROBJS): $(RSRCS)
> +$(ROBJS): $(RSRCS) $(SYS-OBJ)
>         $(E) "  CC      " $@
>         $(Q) $(CC) -c $(CFLAGS) $(PIEFLAGS) $(patsubst %.o,%.c,$@) -o $@
> 
>  $(RBLOB-BIN): $(RBLOB-LDS) $(ROBJS)
>         $(E) "  GEN     " $@
> -       $(Q) $(LD) --oformat=binary -T $(RBLOB-LDS) -o $(RBLOB-BIN) $(ROBJS)
> -       $(Q) $(LD) --oformat=elf64-x86-64 -T $(RBLOB-LDS) -o $(RBLOB-BIN).o $(ROBJS)
> +       $(Q) $(LD) --oformat=binary -T $(RBLOB-LDS) -o $(RBLOB-BIN) $(ROBJS) $(SYS-OBJ)
> +       $(Q) $(LD) --oformat=elf64-x86-64 -T $(RBLOB-LDS) -o $(RBLOB-BIN).o $(ROBJS) $(SYS-OBJ)
> 
>  $(RBLOB-HDR): $(RBLOB-BIN) $(GEN-OFFSETS)
>         $(E) "  GEN     " $@
> diff --git a/Makefile.syscall b/Makefile.syscall
> new file mode 100644
> index 0000000..e7e5ae2
> --- /dev/null
> +++ b/Makefile.syscall
> @@ -0,0 +1,37 @@
> +SYS-DEF                := include/syscall.def
> +SYS-ASM-COMMON := syscall-common.S
> +SYS-TYPES      := include/syscall-types.h
> +
> +SYS-CODES      := include/syscall-codes.h
> +SYS-PROTO      := include/syscall.h
> +
> +SYS-ASM                := syscall.S
> +SYS-GEN                := syscalls.pl
> +
> +SYS-OBJ                := $(patsubst %.S,%.o,$(SYS-ASM))
> +
> +SYS-FLAGS      := -pie -Wstrict-prototypes -D__ASSEMBLY__ -nostdlib -fomit-frame-pointer
> +
> +$(SYS-ASM): $(SYS-GEN) $(SYS-DEF) $(SYS-ASM-COMMON) $(SYS-TYPES)
> +       $(E) "  GEN     " $@
> +       $(Q) $(PERL)                    \
> +               $(SYS-GEN)              \
> +               $(SYS-DEF)              \
> +               $(SYS-CODES)            \
> +               $(SYS-PROTO)            \
> +               $(SYS-ASM)              \
> +               $(SYS-ASM-COMMON)       \
> +               $(SYS-TYPES)
> +
> +$(SYS-OBJ): $(SYS-ASM)
> +       $(E) "  CC      " $@.prelim
> +       $(Q) $(CC) -c $(CFLAGS) $(SYS-FLAGS)  $< -o $@.prelim
> +       $(E) "  LD      " $@
> +       $(Q) $(LD) --oformat=elf64-x86-64 -T $(PBLOB-LDS) $@.prelim -o $@
> +
> +cleansyscall:
> +       $(E) "  CLEAN SYSCALLS"
> +       $(Q) $(RM) -f ./$(SYS-ASM)
> +       $(Q) $(RM) -f ./$(SYS-CODES)
> +       $(Q) $(RM) -f ./$(SYS-PROTO)
> +       $(Q) $(RM) -f ./*.prelim
> diff --git a/cr-restore.c b/cr-restore.c
> index 6f4bcad..bf0e42b 100644
> --- a/cr-restore.c
> +++ b/cr-restore.c
> @@ -485,7 +485,7 @@ static int prepare_sigactions(int pid)
>                  * A pure syscall is used, because glibc
>                  * sigaction overwrites se_restorer.
>                  */
> -               ret = sys_sigaction(sig, &act, &oact);
> +               ret = sys_sigaction(sig, &act, &oact, sizeof(rt_sigset_t));
>                 if (ret == -1) {
>                         pr_err("%d: Can't restore sigaction: %m\n", pid);
>                         goto err;
> diff --git a/include/syscall-codes.h b/include/syscall-codes.h
> deleted file mode 100644
> index b7ed848..0000000
> --- a/include/syscall-codes.h
> +++ /dev/null
> @@ -1,63 +0,0 @@
> -#ifndef CR_SYSCALL_CODES_H_
> -#define CR_SYSCALL_CODES_H_
> -
> -#ifdef CONFIG_X86_64
> -
> -#define __NR_read              0
> -#define __NR_write             1
> -#define __NR_open              2
> -#define __NR_close             3
> -#define __NR_lseek             8
> -#define __NR_mmap              9
> -#define __NR_mprotect          10
> -#define __NR_munmap            11
> -#define __NR_brk               12
> -#define __NR_rt_sigaction      13
> -#define __NR_rt_sigprocmask    14
> -#define __NR_rt_sigreturn      15
> -#define __NR_mincore           27
> -#define __NR_shmat             30
> -#define __NR_dup               32
> -#define __NR_dup2              33
> -#define __NR_pause             34
> -#define __NR_nanosleep         35
> -#define __NR_getitimer         36
> -#define __NR_setitimer         38
> -#define __NR_getpid            39
> -#define __NR_socket            41
> -#define __NR_sendmsg           46
> -#define __NR_recvmsg           47
> -#define __NR_bind              49
> -#define __NR_setsockopt                54
> -#define __NR_getsockopt                55
> -#define __NR_clone             56
> -#define __NR_exit              60
> -#define __NR_wait4             61
> -#define __NR_kill              62
> -#define __NR_fcntl             72
> -#define __NR_flock             73
> -#define __NR_unlink            87
> -#define __NR_setresuid         117
> -#define __NR_setresgid         119
> -#define __NR_setfsuid          122
> -#define __NR_setfsgid          123
> -#define __NR_capset            126
> -#define __NR_tgkill            131
> -#define __NR__sysctl           156
> -#define __NR_prctl             157
> -#define __NR_arch_prctl                158
> -#define __NR_gettid            186
> -#define __NR_futex             202
> -#define __NR_set_thread_area   205
> -#define __NR_get_thread_area   211
> -#define __NR_set_tid_address   218
> -#define __NR_restart_syscall   219
> -#define __NR_msync             227
> -#define __NR_setns             308
> -#define __NR_kcmp              312
> -
> -#else /* CONFIG_X86_64 */
> -# error x86-32 bit mode not yet implemented
> -#endif /* CONFIG_X86_64 */
> -
> -#endif /* CR_SYSCALL_CODES_H_ */
> diff --git a/include/syscall-types.h b/include/syscall-types.h
> new file mode 100644
> index 0000000..e3160a3
> --- /dev/null
> +++ b/include/syscall-types.h
> @@ -0,0 +1,52 @@
> +/*
> + * Please add here type definitions if
> + * syscall prototypes need them.
> + *
> + * Anything else should go to plain type.h
> + */
> +
> +#ifndef SYSCALL_TYPES_H__
> +#define SYSCALL_TYPES_H__
> +
> +#include <sys/types.h>
> +#include <sys/time.h>
> +#include <arpa/inet.h>
> +#include <fcntl.h>
> +
> +#include "types.h"
> +#include "compiler.h"
> +
> +#ifndef CONFIG_X86_64
> +# error x86-32 bit mode not yet implemented
> +#endif
> +
> +struct cap_header {
> +       u32 version;
> +       int pid;
> +};
> +
> +struct cap_data {
> +       u32 eff;
> +       u32 prm;
> +       u32 inh;
> +};
> +
> +struct sockaddr;
> +struct msghdr;
> +struct rusage;
> +
> +#ifndef CLONE_NEWPID
> +#define CLONE_NEWPID   0x20000000
> +#endif
> +
> +#ifndef CLONE_NEWUTS
> +#define CLONE_NEWUTS   0x04000000
> +#endif
> +
> +#ifndef CLONE_NEWIPC
> +#define CLONE_NEWIPC   0x08000000
> +#endif
> +
> +#define setns  sys_setns
> +
> +#endif /* SYSCALL_TYPES_H__ */
> diff --git a/include/syscall.def b/include/syscall.def
> new file mode 100644
> index 0000000..50c0926
> --- /dev/null
> +++ b/include/syscall.def
> @@ -0,0 +1,58 @@
> +#
> +# System calls table, please make sure the table consist only the syscalls
> +# really used somewhere in project.
> +#
> +# The template is (name and srguments are optinal if you need only __NR_x
> +# defined, but no realy entry point in syscalls lib).
> +#
> +# name                 code            name                    arguments
> +# -----------------------------------------------------------------------
> +#
> +__NR_read              0               sys_read                (int fd, void *buf, unsigned long count)
> +__NR_write             1               sys_write               (int fd, const void *buf, unsigned long count)
> +__NR_open              2               sys_open                (const char *filename, unsigned long flags, unsigned long mode)
> +__NR_close             3               sys_close               (int fd)
> +__NR_lseek             8               sys_lseek               (int fd, unsigned long offset, unsigned long origin)
> +__NR_mmap              9               sys_mmap                (void *addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long fd, unsigned long offset)
> +__NR_mprotect          10              sys_mprotect            (const void *addr, unsigned long len, unsigned long prot)
> +__NR_munmap            11              sys_munmap              (void *addr, unsigned long len)
> +__NR_brk               12              sys_brk                 (void *addr)
> +__NR_rt_sigaction      13              sys_sigaction           (int signum, const rt_sigaction_t *act, rt_sigaction_t *oldact, size_t sigsetsize)
> +__NR_rt_sigprocmask    14              sys_sigprocmask         (int how, k_rtsigset_t *set, k_rtsigset_t *old, size_t sigsetsize)
> +__NR_rt_sigreturn      15              sys_rt_sigreturn        (void)
> +__NR_mincore           27              sys_mincore             (void *addr, unsigned long size, unsigned char *vec)
> +__NR_shmat             30              sys_shmat               (int shmid, void *shmaddr, int shmflag)
> +__NR_pause             34              sys_pause               (void)
> +__NR_nanosleep         35              sys_nanosleep           (struct timespec *req, struct timespec *rem)
> +__NR_getitimer         36              sys_getitimer           (int which, const struct itimerval *val)
> +__NR_setitimer         38              sys_setitimer           (int which, const struct itimerval *val, struct itimerval *old)
> +__NR_getpid            39              sys_getpid              (void)
> +__NR_socket            41              sys_socket              (int domain, int type, int protocol)
> +__NR_sendmsg           46              sys_sendmsg             (int sockfd, const struct msghdr *msg, int flags)
> +__NR_recvmsg           47              sys_recvmsg             (int sockfd, struct msghdr *msg, int flags)
> +__NR_bind              49              sys_bind                (int sockfd, const struct sockaddr *addr, int addrlen)
> +__NR_setsockopt                54              sys_setsockopt          (int sockfd, int level, int optname, const void *optval, socklen_t optlen)
> +__NR_getsockopt                55              sys_getsockopt          (int sockfd, int level, int optname, const void *optval, socklen_t *optlen)
> +__NR_clone             56              sys_clone               (unsigned long flags, void *child_stack, void *parent_tid, void *child_tid)
> +__NR_exit              60              sys_exit                (unsigned long error_code)
> +__NR_wait4             61              sys_waitpid             (int pid, int *status, int options, struct rusage *ru)
> +__NR_kill              62              sys_kill                (long pid, int sig)
> +__NR_fcntl             72              sys_fcntl               (int fd, int type, long arg)
> +__NR_flock             73              sys_flock               (int fd, unsigned long cmd)
> +__NR_unlink            87              sys_unlink              (char *pathname)
> +__NR_setresuid         117             sys_setresuid           (int uid, int euid, int suid)
> +__NR_setresgid         119             sys_setresgid           (int gid, int egid, int sgid)
> +__NR_setfsuid          122             sys_setfsuid            (int fsuid)
> +__NR_setfsgid          123             sys_setfsgid            (int fsgid)
> +__NR_capset            126             sys_capset              (struct cap_header *h, struct cap_data *d)
> +__NR_personality       135             sys_personality         (unsigned int personality)
> +__NR_prctl             157             sys_prctl               (int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5)
> +__NR_arch_prctl                158             sys_arch_prctl          (int option, unsigned long addr)
> +__NR_gettid            186             sys_gettid              (void)
> +__NR_futex             202             sys_futex               (u32 *uaddr, int op, u32 val, struct timespec *utime, u32 *uaddr2, u32 val3)
> +__NR_set_thread_area   205             sys_set_thread_area     (user_desc_t *info)
> +__NR_get_thread_area   211             sys_get_thread_area     (user_desc_t *info)
> +__NR_set_tid_address   218             sys_set_tid_address     (int *tid_addr)
> +__NR_restart_syscall   219             sys_restart_syscall     (void)
> +__NR_setns             308             sys_setns               (int fd, int nstype)
> +__NR_kcmp              312             sys_kcmp                (pid_t pid1, pid_t pid2, int type, unsigned long idx1, unsigned long idx2)
> diff --git a/include/syscall.h b/include/syscall.h
> deleted file mode 100644
> index 5baaf7f..0000000
> --- a/include/syscall.h
> +++ /dev/null
> @@ -1,444 +0,0 @@
> -#ifndef CR_SYSCALL_H_
> -#define CR_SYSCALL_H_
> -
> -#include <sys/types.h>
> -#include <sys/time.h>
> -#include <arpa/inet.h>
> -
> -#include "types.h"
> -#include "compiler.h"
> -#include "syscall-codes.h"
> -
> -#ifdef CONFIG_X86_64
> -
> -static always_inline long syscall0(int nr)
> -{
> -       long ret;
> -       asm volatile(
> -               "movl %1, %%eax         \t\n"
> -               "syscall                \t\n"
> -               "movq %%rax, %0         \t\n"
> -               : "=r"(ret)
> -               : "g" ((int)nr)
> -               : "rax", "memory");
> -       return ret;
> -}
> -
> -static always_inline long syscall1(int nr, unsigned long arg0)
> -{
> -       long ret;
> -       asm volatile(
> -               "movl %1, %%eax         \t\n"
> -               "movq %2, %%rdi         \t\n"
> -               "syscall                \t\n"
> -               "movq %%rax, %0         \t\n"
> -               : "=r"(ret)
> -               : "g" ((int)nr), "g" (arg0)
> -               : "rax", "rdi", "memory");
> -       return ret;
> -}
> -
> -static always_inline long syscall2(int nr, unsigned long arg0, unsigned long arg1)
> -{
> -       long ret;
> -       asm volatile(
> -               "movl %1, %%eax         \t\n"
> -               "movq %2, %%rdi         \t\n"
> -               "movq %3, %%rsi         \t\n"
> -               "syscall                \t\n"
> -               "movq %%rax, %0         \t\n"
> -               : "=r"(ret)
> -               : "g" ((int)nr), "g" (arg0), "g" (arg1)
> -               : "rax", "rdi", "rsi", "memory");
> -       return ret;
> -}
> -
> -static always_inline long syscall3(int nr, unsigned long arg0, unsigned long arg1,
> -                                  unsigned long arg2)
> -{
> -       long ret;
> -       asm volatile(
> -               "movl %1, %%eax         \t\n"
> -               "movq %2, %%rdi         \t\n"
> -               "movq %3, %%rsi         \t\n"
> -               "movq %4, %%rdx         \t\n"
> -               "syscall                \t\n"
> -               "movq %%rax, %0         \t\n"
> -               : "=r"(ret)
> -               : "g" ((int)nr), "g" (arg0), "g" (arg1), "g" (arg2)
> -               : "rax", "rdi", "rsi", "rdx", "memory");
> -       return ret;
> -}
> -
> -static always_inline long syscall4(int nr, unsigned long arg0, unsigned long arg1,
> -                                  unsigned long arg2, unsigned long arg3)
> -{
> -       long ret;
> -       asm volatile(
> -               "movl %1, %%eax         \t\n"
> -               "movq %2, %%rdi         \t\n"
> -               "movq %3, %%rsi         \t\n"
> -               "movq %4, %%rdx         \t\n"
> -               "movq %5, %%r10         \t\n"
> -               "syscall                \t\n"
> -               "movq %%rax, %0         \t\n"
> -               : "=r"(ret)
> -               : "g" ((int)nr), "g" (arg0), "g" (arg1), "g" (arg2),
> -                       "g" (arg3)
> -               : "rax", "rdi", "rsi", "rdx", "r10", "memory");
> -       return ret;
> -}
> -
> -static long always_inline syscall5(int nr, unsigned long arg0, unsigned long arg1,
> -                                  unsigned long arg2, unsigned long arg3,
> -                                  unsigned long arg4)
> -{
> -       long ret;
> -       asm volatile(
> -               "movl %1, %%eax         \t\n"
> -               "movq %2, %%rdi         \t\n"
> -               "movq %3, %%rsi         \t\n"
> -               "movq %4, %%rdx         \t\n"
> -               "movq %5, %%r10         \t\n"
> -               "movq %6, %%r8          \t\n"
> -               "syscall                \t\n"
> -               "movq %%rax, %0         \t\n"
> -               : "=r"(ret)
> -               : "g" ((int)nr), "g" (arg0), "g" (arg1), "g" (arg2),
> -                       "g" (arg3), "g" (arg4)
> -               : "rax", "rdi", "rsi", "rdx", "r10", "r8", "memory");
> -       return ret;
> -}
> -
> -static long always_inline syscall6(int nr, unsigned long arg0, unsigned long arg1,
> -                                  unsigned long arg2, unsigned long arg3,
> -                                  unsigned long arg4, unsigned long arg5)
> -{
> -       long ret;
> -       asm volatile(
> -               "movl %1, %%eax         \t\n"
> -               "movq %2, %%rdi         \t\n"
> -               "movq %3, %%rsi         \t\n"
> -               "movq %4, %%rdx         \t\n"
> -               "movq %5, %%r10         \t\n"
> -               "movq %6, %%r8          \t\n"
> -               "movq %7, %%r9          \t\n"
> -               "syscall                \t\n"
> -               "movq %%rax, %0         \t\n"
> -               : "=r"(ret)
> -               : "g" ((int)nr), "g" (arg0), "g" (arg1), "g" (arg2),
> -                       "g" (arg3), "g" (arg4), "g" (arg5)
> -               : "rax", "rdi", "rsi", "rdx", "r10", "r8", "r9", "memory");
> -       return ret;
> -}
> -
> -static always_inline unsigned long sys_pause(void)
> -{
> -       return syscall0(__NR_pause);
> -}
> -
> -static always_inline unsigned long sys_shmat(int shmid, void *shmaddr, int shmflag)
> -{
> -       return syscall3(__NR_shmat, shmid, (unsigned long)shmaddr, shmflag);
> -}
> -static always_inline unsigned long sys_mmap(void *addr, unsigned long len, unsigned long prot,
> -                                           unsigned long flags, unsigned long fd, unsigned long offset)
> -{
> -       return syscall6(__NR_mmap, (unsigned long)addr,
> -                       len, prot, flags, fd, offset);
> -}
> -
> -static always_inline unsigned long sys_munmap(void *addr,unsigned long len)
> -{
> -       return syscall2(__NR_munmap, (unsigned long)addr, len);
> -}
> -
> -static always_inline long sys_open(const char *filename, unsigned long flags, unsigned long mode)
> -{
> -       return syscall3(__NR_open, (unsigned long)filename, flags, mode);
> -}
> -
> -static always_inline long sys_sigaction(int signum, const rt_sigaction_t *act, rt_sigaction_t *oldact)
> -{
> -       return syscall4(__NR_rt_sigaction, signum, (unsigned long)act, (unsigned long)oldact, sizeof(rt_sigset_t));
> -}
> -
> -static always_inline long sys_getitimer(int which, const struct itimerval *val)
> -{
> -       return syscall2(__NR_getitimer, (unsigned long)which, (unsigned long)val);
> -}
> -
> -static always_inline long sys_setitimer(int which, const struct itimerval *val, struct itimerval *old)
> -{
> -       return syscall3(__NR_setitimer, (unsigned long)which, (unsigned long)val, (unsigned long)old);
> -}
> -
> -static always_inline long sys_close(int fd)
> -{
> -       return syscall1(__NR_close, fd);
> -}
> -
> -static always_inline long sys_write(unsigned long fd, const void *buf, unsigned long count)
> -{
> -       return syscall3(__NR_write, fd, (unsigned long)buf, count);
> -}
> -
> -static always_inline long sys_mincore(unsigned long addr, unsigned long size, void *vec)
> -{
> -       return syscall3(__NR_mincore, addr, size, (unsigned long)vec);
> -}
> -
> -static always_inline long sys_lseek(unsigned long fd, unsigned long offset, unsigned long origin)
> -{
> -       return syscall3(__NR_lseek, fd, offset, origin);
> -}
> -
> -static always_inline long sys_mprotect(unsigned long start, unsigned long len, unsigned long prot)
> -{
> -       return syscall3(__NR_mprotect, start, len, prot);
> -}
> -
> -static always_inline long sys_nanosleep(struct timespec *req, struct timespec *rem)
> -{
> -       return syscall2(__NR_nanosleep, (unsigned long)req, (unsigned long)rem);
> -}
> -
> -static always_inline long sys_read(unsigned long fd, void *buf, unsigned long count)
> -{
> -       return syscall3(__NR_read, fd, (unsigned long)buf, count);
> -}
> -
> -static always_inline long sys_waitpid(int pid, int *status, int options)
> -{
> -       return syscall4(__NR_wait4, pid, (unsigned long)status, options, 0);
> -}
> -
> -static always_inline long sys_exit(unsigned long error_code)
> -{
> -       return syscall1(__NR_exit, error_code);
> -}
> -
> -static always_inline unsigned long sys_getpid(void)
> -{
> -       return syscall0(__NR_getpid);
> -}
> -
> -static always_inline unsigned long sys_gettid(void)
> -{
> -       return syscall0(__NR_gettid);
> -}
> -
> -static always_inline long sys_unlink(char *pathname)
> -{
> -       return syscall1(__NR_unlink, (unsigned long)pathname);
> -}
> -
> -/*
> - * Note this call expects a signal frame on stack
> - * (regs->sp) so be very carefull here!
> - */
> -static always_inline long sys_rt_sigreturn(void)
> -{
> -       return syscall0(__NR_rt_sigreturn);
> -}
> -
> -static always_inline long sys_sigprocmask(int how, k_rtsigset_t *set,
> -               k_rtsigset_t *old)
> -{
> -       return syscall4(__NR_rt_sigprocmask, how, (unsigned long)set,
> -                       (unsigned long)old, (unsigned long)sizeof(k_rtsigset_t));
> -}
> -
> -static always_inline long sys_set_thread_area(user_desc_t *info)
> -{
> -       return syscall1(__NR_set_thread_area, (long)info);
> -}
> -
> -static always_inline long sys_get_thread_area(user_desc_t *info)
> -{
> -       return syscall1(__NR_get_thread_area, (long)info);
> -}
> -
> -static always_inline long sys_arch_prctl(int code, void *addr)
> -{
> -       return syscall2(__NR_arch_prctl, code, (unsigned long)addr);
> -}
> -
> -static always_inline long sys_prctl(int code, unsigned long arg2, unsigned long arg3,
> -                                   unsigned long arg4, unsigned long arg5)
> -{
> -       return syscall5(__NR_prctl, code, arg2, arg3, arg4, arg5);
> -}
> -
> -static always_inline long sys_brk(unsigned long arg)
> -{
> -       return syscall1(__NR_brk, arg);
> -}
> -
> -static always_inline long sys_clone(unsigned long flags, void *child_stack,
> -                                   void *parent_tid, void *child_tid)
> -{
> -       return syscall4(__NR_clone, flags, (unsigned long)child_stack,
> -                       (unsigned long)parent_tid, (unsigned long)child_tid);
> -}
> -
> -static always_inline long sys_futex(u32 *uaddr, int op, u32 val,
> -                                   struct timespec *utime,
> -                                   u32 *uaddr2, u32 val3)
> -{
> -       return syscall6(__NR_futex, (unsigned long)uaddr,
> -                       (unsigned long)op, (unsigned long)val,
> -                       (unsigned long)utime,
> -                       (unsigned long)uaddr2,
> -                       (unsigned long)val3);
> -}
> -
> -static always_inline long sys_flock(unsigned long fd, unsigned long cmd)
> -{
> -       return syscall2(__NR_flock, fd, cmd);
> -}
> -
> -static void always_inline local_sleep(long seconds)
> -{
> -       struct timespec req, rem;
> -
> -       req = (struct timespec){
> -               .tv_sec         = seconds,
> -               .tv_nsec        = 0,
> -       };
> -
> -       sys_nanosleep(&req, &rem);
> -}
> -
> -static long always_inline sys_kill(long pid, int sig)
> -{
> -       return syscall2(__NR_kill, pid, (long)sig);
> -}
> -
> -static long always_inline sys_tgkill(long tgid, long pid, int sig)
> -{
> -       return syscall3(__NR_tgkill, tgid, pid, (long)sig);
> -}
> -
> -static long always_inline sys_msync(void *addr, unsigned long length, int flags)
> -{
> -       return syscall3(__NR_msync, (long)addr, length, (long)flags);
> -}
> -
> -static long always_inline sys_setns(int fd, int nstype)
> -{
> -       return syscall2(__NR_setns, (long)fd, (long)nstype);
> -}
> -
> -static long sys_setresuid(int uid, int euid, int suid)
> -{
> -       return syscall3(__NR_setresuid, (long)uid, (long)euid, (long)suid);
> -}
> -
> -static long sys_setresgid(int gid, int egid, int sgid)
> -{
> -       return syscall3(__NR_setresgid, (long)gid, (long)egid, (long)sgid);
> -}
> -
> -static long sys_setfsuid(int fsuid)
> -{
> -       return syscall1(__NR_setfsuid, (long)fsuid);
> -}
> -
> -static long sys_setfsgid(int fsgid)
> -{
> -       return syscall1(__NR_setfsgid, (long)fsgid);
> -}
> -
> -struct cap_header {
> -       u32 version;
> -       int pid;
> -};
> -
> -struct cap_data {
> -       u32 eff;
> -       u32 prm;
> -       u32 inh;
> -};
> -
> -static long sys_capset(struct cap_header *h, struct cap_data *d)
> -{
> -       return syscall2(__NR_capset, (long)h, (long)d);
> -}
> -
> -static int sys_socket(int domain, int type, int protocol)
> -{
> -       return syscall3(__NR_socket, (long) domain, (long) type, (long) protocol);
> -}
> -
> -struct sockaddr;
> -static int sys_bind(int sockfd, const struct sockaddr *addr, int addrlen)
> -{
> -       return syscall3(__NR_bind, (long)sockfd, (long)addr, (long) addrlen);
> -}
> -
> -struct msghdr;
> -static long sys_sendmsg(int sockfd, const struct msghdr *msg, int flags)
> -{
> -       return syscall3(__NR_sendmsg, (long)sockfd, (long)msg, (long) flags);
> -}
> -
> -static long sys_recvmsg(int sockfd, struct msghdr *msg, int flags)
> -{
> -       return syscall3(__NR_recvmsg, (long)sockfd, (long)msg, (long) flags);
> -}
> -
> -static long always_inline sys_getsockopt(int sockfd, int level, int optname,
> -                                        const void *optval, socklen_t *optlen)
> -{
> -       return syscall5(__NR_getsockopt, (unsigned long)sockfd,
> -                       (unsigned long)level, (unsigned long)optname,
> -                       (unsigned long)optval, (unsigned long)optlen);
> -}
> -
> -static long always_inline sys_setsockopt(int sockfd, int level, int optname,
> -                                        const void *optval, socklen_t optlen)
> -{
> -       return syscall5(__NR_setsockopt, (unsigned long)sockfd,
> -                       (unsigned long)level, (unsigned long)optname,
> -                       (unsigned long)optval, (unsigned long)optlen);
> -}
> -
> -static void sys_set_tid_address(int *tid_addr) {
> -       syscall1(__NR_set_tid_address, (long) tid_addr);
> -}
> -
> -static long always_inline
> -sys_kcmp(pid_t pid1, pid_t pid2, int type, unsigned long idx1, unsigned long idx2)
> -{
> -       return syscall5(__NR_kcmp, (long)pid1, (long)pid2, (long)type, idx1, idx2);
> -}
> -
> -static long always_inline sys_fcntl(int fd, int type, long arg)
> -{
> -       return syscall3(__NR_fcntl, (long)fd, (long)type, (long)arg);
> -}
> -
> -#ifndef F_GETFD
> -#define F_GETFD 1
> -#endif
> -
> -#ifndef CLONE_NEWPID
> -#define CLONE_NEWPID   0x20000000
> -#endif
> -
> -#ifndef CLONE_NEWUTS
> -#define CLONE_NEWUTS   0x04000000
> -#endif
> -
> -#ifndef CLONE_NEWIPC
> -#define CLONE_NEWIPC   0x08000000
> -#endif
> -
> -#define setns  sys_setns
> -
> -#else /* CONFIG_X86_64 */
> -# error x86-32 bit mode not yet implemented
> -#endif /* CONFIG_X86_64 */
> -
> -#endif /* CR_SYSCALL_H_ */
> diff --git a/parasite.c b/parasite.c
> index e34299e..407376a 100644
> --- a/parasite.c
> +++ b/parasite.c
> @@ -45,7 +45,7 @@ static int brk_init(void)
>         /*
>          *  Map 10 MB. Hope this will be enough for unix skb's...
>          */
> -       ret = sys_mmap(0, MAX_HEAP_SIZE,
> +       ret = sys_mmap(NULL, MAX_HEAP_SIZE,
>                             PROT_READ | PROT_WRITE,
>                             MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
>         if (ret < 0)
> @@ -165,7 +165,7 @@ static int dump_pages(struct parasite_dump_pages_args *args)
>         if (!(args->vma_entry.prot & PROT_READ)) {
>                 prot_old = (unsigned long)args->vma_entry.prot;
>                 prot_new = prot_old | PROT_READ;
> -               ret = sys_mprotect((unsigned long)args->vma_entry.start,
> +               ret = sys_mprotect((void *)args->vma_entry.start,
>                                    (unsigned long)vma_entry_len(&args->vma_entry),
>                                    prot_new);
>                 if (ret) {
> @@ -180,7 +180,7 @@ static int dump_pages(struct parasite_dump_pages_args *args)
>          * so stick for mincore as a basis.
>          */
> 
> -       ret = sys_mincore((unsigned long)args->vma_entry.start, length, map);
> +       ret = sys_mincore((void *)args->vma_entry.start, length, map);
>         if (ret) {
>                 sys_write_msg("sys_mincore failed\n");
>                 SET_PARASITE_RET(st, ret);
> @@ -215,7 +215,7 @@ static int dump_pages(struct parasite_dump_pages_args *args)
>          * Don't left pages readable if they were not.
>          */
>         if (prot_old != prot_new) {
> -               ret = sys_mprotect((unsigned long)args->vma_entry.start,
> +               ret = sys_mprotect((void *)args->vma_entry.start,
>                                    (unsigned long)vma_entry_len(&args->vma_entry),
>                                    prot_old);
>                 if (ret) {
> @@ -255,7 +255,7 @@ static int dump_sigact(parasite_status_t *st)
>                 if (sig == SIGKILL || sig == SIGSTOP)
>                         continue;
> 
> -               ret = sys_sigaction(sig, NULL, &act);
> +               ret = sys_sigaction(sig, NULL, &act, sizeof(rt_sigset_t));
>                 if (ret < 0) {
>                         sys_write_msg("sys_sigaction failed\n");
>                         SET_PARASITE_RET(st, ret);
> @@ -409,7 +409,7 @@ static int init(struct parasite_init_args *args)
>         }
> 
>         ksigfillset(&to_block);
> -       ret = sys_sigprocmask(SIG_SETMASK, &to_block, &old_blocked);
> +       ret = sys_sigprocmask(SIG_SETMASK, &to_block, &old_blocked, sizeof(k_rtsigset_t));
>         if (ret < 0)
>                 reset_blocked = ret;
>         else
> @@ -436,7 +436,7 @@ static int parasite_set_logfd(parasite_status_t *st)
>  static int fini(void)
>  {
>         if (reset_blocked == 1)
> -               sys_sigprocmask(SIG_SETMASK, &old_blocked, NULL);
> +               sys_sigprocmask(SIG_SETMASK, &old_blocked, NULL, sizeof(k_rtsigset_t));
>         sys_close(logfd);
>         sys_close(tsock);
>         brk_fini();
> diff --git a/restorer.c b/restorer.c
> index 4cf688b..7af708d 100644
> --- a/restorer.c
> +++ b/restorer.c
> @@ -184,7 +184,7 @@ long __export_restore_thread(struct thread_restore_args *args)
>         CPREGT1(fs);
> 
>         fsgs_base = core_entry->arch.gpregs.fs_base;
> -       ret = sys_arch_prctl(ARCH_SET_FS, (void *)fsgs_base);
> +       ret = sys_arch_prctl(ARCH_SET_FS, fsgs_base);
>         if (ret) {
>                 write_num_n(__LINE__);
>                 write_num_n(ret);
> @@ -192,7 +192,7 @@ long __export_restore_thread(struct thread_restore_args *args)
>         }
> 
>         fsgs_base = core_entry->arch.gpregs.gs_base;
> -       ret = sys_arch_prctl(ARCH_SET_GS, (void *)fsgs_base);
> +       ret = sys_arch_prctl(ARCH_SET_GS, fsgs_base);
>         if (ret) {
>                 write_num_n(__LINE__);
>                 write_num_n(ret);
> @@ -349,9 +349,9 @@ long __export_restore_task(struct task_restore_core_args *args)
>         rt_sigaction_t act;
> 
>         task_entries = args->task_entries;
> -       sys_sigaction(SIGCHLD, NULL, &act);
> +       sys_sigaction(SIGCHLD, NULL, &act, sizeof(rt_sigset_t));
>         act.rt_sa_handler = sigchld_handler;
> -       sys_sigaction(SIGCHLD, &act, NULL);
> +       sys_sigaction(SIGCHLD, &act, NULL, sizeof(rt_sigset_t));
> 
>         restorer_set_logfd(args->logfd);
> 
> @@ -460,7 +460,7 @@ long __export_restore_task(struct task_restore_core_args *args)
>                 if (vma_entry->prot & PROT_WRITE)
>                         continue;
> 
> -               sys_mprotect(vma_entry->start,
> +               sys_mprotect((void *)vma_entry->start,
>                              vma_entry_len(vma_entry),
>                              vma_entry->prot);
>         }
> @@ -538,7 +538,7 @@ long __export_restore_task(struct task_restore_core_args *args)
>         CPREG1(fs);
> 
>         fsgs_base = core_entry->arch.gpregs.fs_base;
> -       ret = sys_arch_prctl(ARCH_SET_FS, (void *)fsgs_base);
> +       ret = sys_arch_prctl(ARCH_SET_FS, fsgs_base);
>         if (ret) {
>                 write_num_n(__LINE__);
>                 write_num_n(ret);
> @@ -546,7 +546,7 @@ long __export_restore_task(struct task_restore_core_args *args)
>         }
> 
>         fsgs_base = core_entry->arch.gpregs.gs_base;
> -       ret = sys_arch_prctl(ARCH_SET_GS, (void *)fsgs_base);
> +       ret = sys_arch_prctl(ARCH_SET_GS, fsgs_base);
>         if (ret) {
>                 write_num_n(__LINE__);
>                 write_num_n(ret);
> @@ -687,7 +687,7 @@ long __export_restore_task(struct task_restore_core_args *args)
> 
>         futex_wait_while(&args->task_entries->start, CR_STATE_RESTORE);
> 
> -       sys_sigaction(SIGCHLD, &args->sigchld_act, NULL);
> +       sys_sigaction(SIGCHLD, &args->sigchld_act, NULL, sizeof(rt_sigset_t));
> 
>         futex_dec_and_wake(&args->task_entries->nr_in_progress);
> 
> diff --git a/syscall-common.S b/syscall-common.S
> new file mode 100644
> index 0000000..84bcd8b
> --- /dev/null
> +++ b/syscall-common.S
> @@ -0,0 +1,16 @@
> +#include "linkage.h"
> +
> +#define SYSCALL(name, opcode)          \
> +       ENTRY(name);                    \
> +       movl    $opcode, %eax;          \
> +       jmp     __syscall_common;       \
> +       END(name)
> +
> +       .text
> +       .align  4
> +
> +ENTRY(__syscall_common)
> +       movq    %rcx, %r10
> +       syscall
> +       ret
> +END(__syscall_common)
> diff --git a/syscalls.pl b/syscalls.pl
> new file mode 100644
> index 0000000..f751869
> --- /dev/null
> +++ b/syscalls.pl
> @@ -0,0 +1,51 @@
> +#!/usr/bin/perl -w
> +
> +my($in, $codes, $protos, $asm, $asmcommon, $proto_types) = @ARGV;
> +
> +open(IN, "< $in") or die "$0: cannot open: $in\n";
> +open(CODES, "> $codes") or die "$0: cannot open: $codes\n";
> +open(PROTOS, "> $protos") or die "$0: cannot open: $protos\n";
> +open(ASM, "> $asm") or die "$0: cannot open: $asm\n";
> +
> +$codes =~ s/include\///g;
> +$protos =~ s/include\///g;
> +$proto_types =~ s/include\///g;
> +
> +print ASM "/* Autogenerated, don't edit */\n";
> +print ASM "#include \"$codes\"\n\n";
> +print ASM "#include \"$asmcommon\"\n";
> +
> +my($codes_def, $protos_def) = ($codes, $protos);
> +
> +$codes_def =~ s/[\s|\t|\-|\.|\/]/_/g;
> +$protos_def =~ s/[\s|\t|\-|\.|\/]/_/g;
> +
> +print CODES  "/* Autogenerated, don't edit */\n#ifndef $codes_def\n#define $codes_def\n";
> +print PROTOS "/* Autogenerated, don't edit */\n#ifndef $protos_def\n#define $protos_def\n";
> +print PROTOS "#include \"$proto_types\"\n";
> +print PROTOS "#include \"$codes\"\n";
> +
> +while (defined($line = <IN>)) {
> +       chomp $line;
> +       $line =~ s/^\s+//;
> +       $line =~ s/\s*\#.*$//;
> +       next if ($line eq '');
> +
> +       my(@field) = split(/\t+/, $line);
> +
> +       if ($#field >= 1) {
> +               print CODES "#define $field[0] $field[1]\n";
> +       }
> +
> +       if ($#field >= 2) {
> +               print PROTOS "extern long $field[2]$field[3];\n";
> +               print ASM "SYSCALL($field[2], $field[0])\n";
> +       }
> +}
> +
> +print CODES "#endif /* $codes_def */\n";
> +print PROTOS "#endif /* $protos_def */\n";
> +
> +close(IN);
> +close(CODES);
> +close(PROTOS);
> --
> 1.7.7.6
> 
> .
> 



More information about the CRIU mailing list