[CRIU] Restore failed. Exit code: 43

Paschalis Mpeis paschalis.mpeis at ed.ac.uk
Tue Jan 20 04:58:18 PST 2015


(resent message because I CC'ed criu mailing list, instead of replying to
it)

​The initialisation code, that runs before dump and restore is the
following:

>
> ​  ​
> int img_fd = open_img_dir("wdir/i/linpack_cr/");

    criu_init_opts();
>     criu_set_service_address("./wdir/s/cs.sk");
>     criu_set_images_dir_fd(img_fd);
> ​​
>
>     criu_set_log_level(4);


​The code to dump (which seems to work okay) is this:

>    int pid, ret;
>     // Create a child
>     pid = fork();
>     assert(pid>=0);
>
>     if(!pid){     // The child will dump itself
>         close(0); close(1); close(2);
>         assert(setsid()>=0);
>         criu_set_log_file("dump.log");
>         criu_set_leave_running(true)
>         ret = criu_dump();
>         if (ret < 0){
>             what_err_ret_mean(ret);
>             exit(1);
>         }
>         if (ret ==0)
>             ret = SUCC_DUMP_ECODE;
>         else if (ret ==1)
>             ret = SUCC_RSTR_ECODE;
>         else
>             ret =1;
>         exit(ret);
>     }// end-of child code
>     // Wait for the child to be captured
>     if(waitpid(pid,&ret,0)<0){
>         perror("Can't wait child");
>         kill(pid, SIGKILL);
>         exit(-1);
>     }
>     if(chk_exit(ret,SUCC_DUMP_ECODE)){
>         kill(pid,SIGKILL);
>         exit(-1);
>     }



Initially the restore code was taken from one of your tests.
I was using:

criu_set_log_file("restore.log");
> pid = criu_restore_child();
> if (pid <=0){ what_err_ret_mean(pid);
> exit(-1);
> }
>
> if(waitpid(pid, &ret, 0)<0){
> perror("Can't wait for restore");
> kill(pid,SIGKILL);
> exit(-1);
> }
> return chk_exit(ret,SUCC_DUMP_ECODE);


​chk_exit​ was printing the "exit 43" message. It is the function found
here:
https://github.com/xemul/criu/blob/master/test/libcriu/lib.c


Then I changed the restore code simply:

> criu_set_log_file("restore.log");
> criu_restore();


​This produces a similar restore.log, with success messages, but the
program does not seem to continue.

​Thanks for your replies.
I haven't found any other examples other than the tests directory. That's
why I based my code on them.

Cheers,
Paschalis​

On Tue, Jan 20, 2015 at 12:45 PM, Paschalis Mpeis <paschalis.mpeis at ed.ac.uk>
wrote:

> ​The initialisation code, that runs before dump and restore is the
> following:
>
>>
>> ​  ​
>> int img_fd = open_img_dir("wdir/i/linpack_cr/");
>
>     criu_init_opts();
>>     criu_set_service_address("./wdir/s/cs.sk");
>>     criu_set_images_dir_fd(img_fd);
>> ​​
>>
>>     criu_set_log_level(4);
>
>
> ​The code to dump (which seems to work okay) is this:
>
>>    int pid, ret;
>>     // Create a child
>>     pid = fork();
>>     assert(pid>=0);
>>
>>     if(!pid){     // The child will dump itself
>>         close(0); close(1); close(2);
>>         assert(setsid()>=0);
>>         criu_set_log_file("dump.log");
>>         criu_set_leave_running(true)
>>         ret = criu_dump();
>>         if (ret < 0){
>>             what_err_ret_mean(ret);
>>             exit(1);
>>         }
>>         if (ret ==0)
>>             ret = SUCC_DUMP_ECODE;
>>         else if (ret ==1)
>>             ret = SUCC_RSTR_ECODE;
>>         else
>>             ret =1;
>>         exit(ret);
>>     }// end-of child code
>>     // Wait for the child to be captured
>>     if(waitpid(pid,&ret,0)<0){
>>         perror("Can't wait child");
>>         kill(pid, SIGKILL);
>>         exit(-1);
>>     }
>>     if(chk_exit(ret,SUCC_DUMP_ECODE)){
>>         kill(pid,SIGKILL);
>>         exit(-1);
>>     }
>
>
>
> Initially the restore code was taken from one of your tests.
> I was using:
>
> criu_set_log_file("restore.log");
>> pid = criu_restore_child();
>> if (pid <=0){ what_err_ret_mean(pid);
>> exit(-1);
>> }
>>
>> if(waitpid(pid, &ret, 0)<0){
>> perror("Can't wait for restore");
>> kill(pid,SIGKILL);
>> exit(-1);
>> }
>> return chk_exit(ret,SUCC_DUMP_ECODE);
>
>
> ​chk_exit​ was printing the "exit 43" message. It is the function found
> here:
> https://github.com/xemul/criu/blob/master/test/libcriu/lib.c
>
>
> Then I changed the restore code simply:
>
>> criu_set_log_file("restore.log");
>> criu_restore();
>
>
> ​This produces a similar restore.log, with success messages, but the
> program does not seem to continue.
>
> ​Thanks for your replies.
> I haven't found any other examples other than the tests directory. That's
> why I based my code on them.
>
> Cheers,
> Paschalis​
>
> On Tue Jan 20 2015 at 12:22:15 PM Cyrill Gorcunov <gorcunov at gmail.com>
> wrote:
>
>> On Tue, Jan 20, 2015 at 03:06:39PM +0300, Pavel Emelyanov wrote:
>> > On 01/19/2015 10:31 PM, Paschalis Mpeis wrote:
>> > > I am trying to capture, and replay a simple benchmark application.
>> > > The application accepts as a command line argument (CLA) an integer
>> value to denote whether we are capturing or restoring.
>> > >
>> > > On both capture and restore, I run the CRIU initialisation staff
>> (provide folder for images, etc).
>> > >
>> > > I first run the application with the capture integer value. I set the
>> leave_running option to true, I do the capture, which seems successful, and
>> then the application continues execution and finishes.
>> > >
>> > > Then, I want to replay the application, from the point it was
>> checkpointed. So I run again the application, and I pass as a CLA the
>> restore integer value. For this case, the application simply initialises
>> CRIU, and then tries to restore from the existing images.
>> > >
>> > > I get the following error:
>> > > " `- FAIL (exit 43)"
>> >
>> > But that's not CRIU message. Who prints that and what does the "exit
>> 43" mean?
>> >
>> > > You can find attached the dump.log and restore.log.
>> >
>> > The restore.log ends with
>> >
>> > (00.025773) Restore finished successfully. Resuming tasks.
>> > (00.025795) 5084 was trapped
>> > (00.025797) `- Expecting exit
>> > (00.025804) 5084 was trapped
>> > (00.025806) 5084 is going to execute the syscall f
>> > (00.025823) 5084 was stopped
>> > (00.025836) 5084 was trapped
>> > (00.025838) 5084 is going to execute the syscall b
>> > (00.025853) 5084 was stopped
>> > (00.025857) Writing stats
>> >
>> > I.e. CRIU thinks that tasks are up and running.
>>
>> Yes, it means everything is up and fine. Paschalis could you please
>> provide more details on your case
>>
>>  - the testing program itself
>>  - step-by-step how you checkpointed and restored it
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150120/c503c419/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.openvz.org/pipermail/criu/attachments/20150120/c503c419/attachment-0001.ksh>


More information about the CRIU mailing list