[CRIU] [PATCH] criu: Add exec-cmd option.

Thu Mar 20 07:36:17 PDT 2014

On 03/20/2014 06:25 PM, Deyan Doychev wrote:
> Hi Christopher,
> 
> On 03/20/2014 04:00 PM, Christopher Covington wrote:
>>>> In my testing, a non-zero exit code wasn't propagated to the command line on
>>>>>> failure.
>>>>
>>>> This is because the restore process becomes a daemon prior to restoring
>>>> the dumped process when --exec-cmd is used.
>>>>
>>>> I am not sure what the right action has to be if we fail to execute the
>>>> command as we have already restored the processes.
>>>> Should we consider this a full failure and if so - should we kill the
>>>> processes we have restored?

Good question. I really doubt we should kill them, since this creates a gap 
between restore and kill during which tasks may make progress that can make 
images obsoleted, e.g. if they have alive tcp connection.

>>>> Maybe the right thing  to do is daemonize only when -d was given instead
>>>> of implying this option and always daemonizing. This way if -d is not
>>>> specified we will exit with failure. But please advise what should we do
>>>> with the restored processes?
>> What do the options look like in the LXC context?
>>
>> For the perf use case killing would make retrying with a corrected exec-cmd
>> string slightly easier. Letting it run would be fine too, though, since it's
>> not that much work for the user to manually kill the process before retrying.
>>
>> In my use case it's not acceptable to have an orphaned restored process
>> running in a separate PID namespace because it might alter system performance
>> undesirably. However, I can imagine other workloads where extra copies,
>> especially if they were sleeping, might not be much to worry about.
>>
>> Regards,
>> Christopher
> 
> Killing the tasks seems to be better for us as well. If we leave them
> running we have a running container that is absolutely ready for use but
> LXC does not know about it and it looks to the outside LXC world like it
> is not running.

That's for the case of LXC container. OpenVZ container, for example, lives
w/o task watching it, so we simply restore it with the --restore-detached
option.

> I currently can't imagine a use case where it will be a good idea to
> restore the tasks without executing the command. Anyone else?

If we're talking about restoring w/o executing _and_ w/o detaching, then two
use cases I know are:

1. Synchronization. People might want to restore process and wait for them
   to finish. With criu process sitting between who forked it and the
   restored tree it is possible. If criu exits and leave the restored tree,
   the criu caller cannot wait till the restored stuff finishes.

2. Shell jobs. If we restore those with --restore-detached option, the latter
   gets reparented to init and effectively goes into background without
   stopping, thus spoiling the job control.

Thanks,
Pavel