[CRIU] Restore failed. Exit code: 43

Paschalis Mpeis paschalis.mpeis at ed.ac.uk
Wed Jan 21 06:19:24 PST 2015


>
> It's much clearer now, thanks :) So what I see happens there.
>
> You start a process A to be the main.c's main with "1" as an argument. At
> some point A calls the
> dumpApplication(). The dumpApplication() fork()-s kid B and A waitpid()-s
> for it. B calls the
> criu_dump() which dumps the B process and leaves it running. B sees 0 ret
> code from criu_dump()
> and then exit()-s with ret-code SUCC_DUMP_ECODE. It's parent (the A task)
> waits the kid, checks
> for exit status being SUCC_DUMP_ECODE() and prints the
>
>     == Captured successfully ==\n\n
>
> message, then dumpApplication() ends, A continues execution. This pretty
> much coincides with what
> is there in the output file.

Next you want to replay and start new process, C, with the main.c's main as
> entry point and the
> "2" as an argument. OK. In this case the restoreApplication() is called
> immediatelly. The latter
> calls criu_restore_child(). Now what happens here is complex, confusing,
> but very interesting and
> kinda unavoidable :) C forks() new child process D, when D is created it
> is "restored" by criu and
> is put into the former B's state -- the state as if it is in the
> dumpAplication() call returning
> from the call to criu_dump(), but getting the code 1 (not 0 as it was in
> B) into ret variable in
> there. Next D will behave just like B did with the only difference that
> ret is 1, not 0, which
> will be decoded into SUCC_RSTR_ECODE by this check
>
>      if (ret ==0)
>          ret = SUCC_DUMP_ECODE;
>      else if (ret ==1)
>          ret = SUCC_RSTR_ECODE;
>      else
>          ret =1;
>
> and then D will call exit(ret) thus exiting with SUCC_RSTR_ECODE code. The
> D's parent (C) will
> be woken up from the waitpid() call (line 119 of crlib.c) and will just
> exit. So this is what you
> should get and do get.
>
>
​Hmmm...​
​ There are in total 4 processes: A & B for capture, and C & D for restore.​
So for each capture or restore, there are two exit points (when each
process terminates).
Lets name the process that does CRIU magic for capture the "capturer", and
the process that does CRIU magic for restore, the "restorer".

From what you have told me, I have understood the following:

*Capture:*
Process A is my program. Then, it is forked, so we have B, in which you do
your magic, so my program is captured. B is a "capturer". Right?

So, when B continues, it does staff unrelated to my program, maybe some
CRIU staff, and then it finally exits.
Then, process A, waits for the dump to be finished, and when this happens,
it continues execution. Specifically, A will continue executing from line
30 here
<https://gist.github.com/Paschalis/a96b2747ed85b8e5a796#file-linpack_h1_-c-L30>
.
​Is that correct?

Also, I have a question regarding command "criu_set_leave_running(true)".
It will be executed by child B, right? Why should I bother setting this,
since B is the capturer?
What I thought this setting was, is that it let process A continue its
execution (not B), after the dump occurred.


*​Replay:*
C starts execution. It calls restoreApplication() here
<https://gist.github.com/Paschalis/a96b2747ed85b8e5a796#file-main-c-L30>.
Then, C is forked so we have process D.
Is C the "restorer"? (I am bit confused about this)

Then you say that D is restored into B state. I do not want this, since B
is the "capturer" and not my program.



> I suspect this is not what you planned to see. Most likely you want D to
> continue doing what A
> was, not B.

​Yes, I want precisely this. Given that I have described the processes A,
B, C, and D above correctly​.



> In that case you should fix the dumpApplication() code not to exit() upon
> seeing the
> SUCC_RSTR_ECODE, but to return from this function. This is the unavoidable
> nature of dump and
> restore. If you dumped yourself (this is what dumpApplication does) and
> then restored, you get
> back in time into the state where you have been right after you have
> requested to dump yourself.
>
> The mentioned check for ret that sets one of SUCC_*_ECODE values is
> differentiating these two
> cases -- whether you have just being dumped, or have just being restored.
>
> Is this explanation clear and helpful?
>
>
​So basically, I will exploit that D will be magically travel-in-time into
"dumpApplication" function, right after the dump, and I will not terminate
it. I will try this right away. I hope that I won't run into further
problems! :)

Ultimately, I'd want to capture and replay just one function. Do you
provide any API calls for doing such thing? One solution might be capture
everything, as I do right now, and then instrument the function I want to
exit right after execution. However, that would have stored in images lot
of unnecessary program state!

The explanation was extremely extremely helpful. Are these explanations
somewhere in your wiki pages? A simple description of these 4 processes on
a capture and restore would have been extremely helpful for all naive users!


>
> Now I have questions about your output and expected-output. The lines
>
>     ##########################################################
>     ##### HERE IT IS THE OUTPUT OF THE LINPACK EXECUTION #####
>     ##########################################################
>
> This was the output that the main_linpack.c produces. See here
<https://gist.github.com/Paschalis/a96b2747ed85b8e5a796#file-main_linpack-c-L178>
.​
I just replaced this output with the above 3 lines so you could read it
more easily!
​



>     After waitpid!
>
> ​This is a printf that I have removed from the gist code. It was put in
this line here
<https://gist.github.com/Paschalis/a96b2747ed85b8e5a796#file-crlib-c-L124>.​


​​

Thanks a lot for your help Pavel.

Cheers,
Paschalis​
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150121/d377fc5d/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150121/d377fc5d/attachment.ksh>


More information about the CRIU mailing list