[CRIU] Restore failed. Exit code: 43
Pavel Emelyanov
xemul at parallels.com
Tue Jan 20 07:13:19 PST 2015
On 01/20/2015 05:57 PM, Paschalis Mpeis wrote:
> Thanks for your replies Pavel.
> Please see inline!
>
> My 1st try on restore:
>
>
> >
> > criu_set_log_file("restore.log");
> > pid = criu_restore_child();
> > if (pid <=0){ what_err_ret_mean(pid);
> > exit(-1);
> > }
> >
> > if(waitpid(pid, &ret, 0)<0){
> > perror("Can't wait for restore");
> > kill(pid,SIGKILL);
> > exit(-1);
> > }
> > return chk_exit(ret,SUCC_DUMP_ECODE);
> >
> >
> > chk_exit was printing the "exit 43" message. It is the function found here:
> > https://github.com/xemul/criu/blob/master/test/libcriu/lib.c
>
> Ah, I see :) Then everything seem to be OK. Look, when you call the criu_dump()
> with zero pid what gets dumped is the process that does this call in the state
> when it has just sent the dump request to service. And this particular state
> is written in the images.
>
> Thus, when you call criu_restore_child() the restored process gets restored in
> a state where it "thinks" as if it has just received the response from the "dump"
> request.
>
> Then, in the code above, this situation ends up in ret = SUCC_RSTR_ECODE (which
> is 43) and exit(ret) which thus is exit(43). Then the parent wait()-s for the
> kid checks it exit code, sees it being the 43 one a prints on the screen.
>
> And this is what I see in your logs :) So, for now, everything looks to work as
> expected. If you think it's not, let's discuss how you expect it to work, we'll
> try to help.
>
>
> My second try on restore:
>
>
> > Then I changed the restore code simply:
> >
> > criu_set_log_file("restore.log");
> > criu_restore();
> >
> >
> > This produces a similar restore.log, with success messages, but the program does not seem to continue.
>
> But according to the code you have shown the program continues with returning
> from the criu_dump() call, then sets ret to SUCC_RSTR_ECODE == 43, then calls
> exit().
>
>
>
> On my 1st try, the restore function was calling "chk_exit".
> But this function never calls "exit". Where did the exit occurred?
Inside the restored process.
> Also, ehon the second try, I simply call criu_restore().
> After the restore, I simply return 0, and then the execution flow goes to the main function,
> which then simply returns.
But the criu_restore() create new process and puts it into a state where it
returns from criu_dump() and calls exit() anyway.
> But I do not see anything being replayed, e.g.
> output of the initial execution
> (that I captured earlier)
> does not show up again
> . What am I doing wrong here?
>
> My main function is very simple, like this:
>
> if(crMode==CAPTURE){
> runLinpack();
> // in above function, at some point of execution,
> // I initialise_criu, and then I dump the application
AFAIU you dump it w/o specifying the pid, so what you dump is this process itself.
> }
> else if(crMode==REPLAY){
> initialise_criu(CRIU_IMG_DIR);
> restoreApplication();
In this call you will create a new process, that would then be restored in the
state as if it has just called the criu_dump() method of the library. Then the
execution of _this_ _new_ process will continue, the task (according to the
dumping code you've shown) will call exit(43) and that's it.
> }
> return 0;
>
>
> I run the main function two separate times. One to capture, but still continue playing the
> application, and another one to replay the application from the earlier snapshot!
I'm now confused, sorry. Can you post
* the full program you run (either, but only one way of calling restore, not two)
* the way you run it (the shell commands you execute, one by one)
* the result you see (if dump and restore are OK, then logs are not required)
* and what you expect it to look like
Thanks,
Pavel
More information about the CRIU
mailing list