[CRIU] Trying to checkpoint/restart VxSim (VxWorks)

Leek, Jim leek2 at llnl.gov
Wed Jun 23 21:45:36 MSK 2021


OS: RedHat Linux 8.4
CRIU version: 3.15 (Installed from yum repository)

I'm very new to CRIU and I'm trying to figure out if it's possible to checkpoint the VxWorks simulator called VxSim.  VxWorks is a Real-Time operating system (RTOS) released by WindRiver.  Therefore VxWorks images are intended to run on embedded systems of various kinds.  However, it comes with a "simulator," called VxSim, that works kind of like wine.
If you build your VxWorks Image as a an ELF executable with the "VxSim linux" target.  Then you launch VxSim, handing it your image.  VxSim launches, and launches a subprocess that actually runs the image.  There is at least one pipe from the child to the parent.  Somehow system calls from the image get intercepted and handled by VxSim.  All this is a bit foggy because VxSim is proprietary, and I don't have the source code. All of this is just gleaned with tools like ps and lsof, which I am definitely not a pro at using.

I suppose the ideal behavior would be to checkpoint the parent and have that also do the child, and the same with restore.  Generally if I try to checkpoint the parent I get some kind of reasonable error.  If I checkpoint the child the parent hangs around and can be interacted with, but the child can't be restored without some way to restore the pipe.  (I think)

I have tried a variety of criu inputs, and tried to follow the wiki, but I got out of my depth.  In the interests of trying to keep this first email short, I'll just give what I think is the most reasonable attempt.  I heard criu-ns works with some of these issues, so I tried that:

$ sudo ./criu-ns dump -t <parentPID> --shell-job --ghost-limit 1G -D check --ext-unix-sk
['./criu-ns', 'dump', '-t', <parentPID>, '--shell-job', '--ghost-limit', '1G', '-D', 'check', '--ext-unix-sk']
Warn  (criu/files-reg.c:1510): Couldn't find the build-id note for file with fd 14
Warn  (criu/files-reg.c:1510): Couldn't find the build-id note for file with fd 15
Warn  (compel/arch/x86/src/lib/infect.c:281): Will restore 383681 with interrupted system call
Error (criu/sk-unix.c:815): unix: Can't dump half of stream unix connection.
Error (criu/cr-dump.c:1768): Dumping FAILED.

----
But then I ran it again and it... kinda worked?  But after restore the child process wasn't running.  So I could interact slightly with the parent, (that is, it acted hung, but when I gave it the exit command it correctly exited.)  But then further attempts to restore didn't work, which seems weird.
$ sudo ./criu-ns dump -t 398070 --shell-job --ghost-limit 1G -D check --ext-unix-sk
['./criu-ns', 'dump', '-t', '398070', '--shell-job', '--ghost-limit', '1G', '-D', 'check', '--ext-unix-sk']
Warn  (criu/files-reg.c:1510): Couldn't find the build-id note for file with fd 14
Warn  (criu/files-reg.c:1510): Couldn't find the build-id note for file with fd 16
Warn  (compel/arch/x86/src/lib/infect.c:281): Will restore 398071 with interrupted system call

$ sudo ./criu-ns restore --shell-job --ghost-limit 1G -D check --ext-unix-sk
['./criu-ns', 'restore', '--shell-job', '--ghost-limit', '1G', '-D', 'check', '--ext-unix-sk']
398070: Warn  (criu/files-reg.c:1786): Can't link var/lib/sss/mc/passwd.cr.1.ghost -> var/lib/sss/mc/passwd
398070: Error (criu/sk-inet.c:1028): inet: Can't bind inet socket (id 53): Address already in use
398070: Error (criu/files.c:1216): Unable to open fd=9 id=0x35
Error (criu/cr-restore.c:2483): Restoring FAILED.

$ kill <child pid>

$ sudo ./criu-ns restore --shell-job --ghost-limit 1G -D check --ext-unix-sk
['./criu-ns', 'restore', '--shell-job', '--ghost-limit', '1G', '-D', 'check', '--ext-unix-sk']
398070: Warn  (criu/files-reg.c:1786): Can't link var/lib/sss/mc/passwd.cr.1.ghost -> var/lib/sss/mc/passwd
398070: Warn  (criu/files-reg.c:1786): Can't link var/lib/sss/mc/passwd.cr.1.ghost -> var/lib/sss/mc/passwd

# restored to a kind of hung state.  Passed exit command here, and it correctly exited.
Goodbye.

$ sudo ./criu-ns restore --shell-job --ghost-limit 1G -D check --ext-unix-sk
['./criu-ns', 'restore', '--shell-job', '--ghost-limit', '1G', '-D', 'check', '--ext-unix-sk']
398070: Warn  (criu/files-reg.c:1786): Can't link var/lib/sss/mc/passwd.cr.1.ghost -> var/lib/sss/mc/passwd
398070: Error (criu/files-reg.c:2221): Can't open file tmp/.vxsim_leek2_1 on restore: No such file or directory
398070: Error (criu/files-reg.c:2161): Can't open file tmp/.vxsim_leek2_1: No such file or directory
398070: Error (criu/files.c:1216): Unable to open fd=4 id=0x2f
Error (criu/cr-restore.c:2483): Restoring FAILED.

$ sudo ./criu-ns restore --shell-job --ghost-limit 1G -D check --ext-unix-sk
['./criu-ns', 'restore', '--shell-job', '--ghost-limit', '1G', '-D', 'check', '--ext-unix-sk']
398070: Warn  (criu/files-reg.c:1786): Can't link var/lib/sss/mc/passwd.cr.1.ghost -> var/lib/sss/mc/passwd
398070: Error (criu/files-reg.c:2221): Can't open file tmp/.vxsim_leek2_1 on restore: No such file or directory
398070: Error (criu/files-reg.c:2161): Can't open file tmp/.vxsim_leek2_1: No such file or directory
398070: Error (criu/files.c:1216): Unable to open fd=4 id=0x2f
Error (criu/cr-restore.c:2483): Restoring FAILED.

So, is this possible and I'm just doing it wrong, or is this kind of thing not supported?

Thanks,
Jim


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20210623/e02dfa54/attachment.html>


More information about the CRIU mailing list