Fwd: Re: [CRIU] Signalling processes before CRIU/after unCRIU
Pavel Emelyanov
xemul at parallels.com
Wed Oct 10 15:21:37 EDT 2012
On 10/10/2012 09:27 PM, Andrew Vagin wrote:
> On Wed, Oct 10, 2012 at 06:26:59PM +0400, Alex/AT wrote:
>>> When we dump a container should we make sure systemd knows how to talk
>>> to the rest of the zoo? This sounds ... strange.
>> Yep. And determination of "parent" may be non-trivial too. It is probably
>> best to be left to the user to signal the "parent" process.
>>
>> When we suspend a large tree from the parent, all the child processes
>> should got signalled before suspending, and be waited for each one to
>> respond (or not respond). Each one may be execution-halted after getting
>> response, and then all the non-responding processes in the tree should be
>> execution-halted at once.
>>
>>> Frankly, I don't want to re-invent TCP for such a simple case.
>> It's not even needed. You were thinking about library mechanism, and that
>> is the best way in terms of predictability.
>
> It's needed, because it's an easiest way to spread this functionality.
>
> Nobody wants to link an additional library, because it is unreliably
> and insecure. In particular if there is not used a standard mechanism
> (like dbus).
>
> I think we should look at dbus documentation. If we find nothing, we
> can ask an advice in a dbus maillist. And if all attempts are failed,
> we can start to create own mechanism.
Since this feature is not even designed yet, but (seem to be) very tempting
already, I'd first list the requirements for it.
So, what we want to is to make crtools be able to communicate with the app(s)
it dumps and restores in order to let the latter facilitate the former. The
only facilitation voiced so far was
a) app may release some non-crucial to it resources e.g. drop caches, close
some fds, kill helper threads, etc
But I can see more facilitation, for example
b) app may want to take some non-tasks-bound stuff with it (e.g. sysvipc objects)
c) app may bear with some environmental change that would otherwise be fatal
(e.g. IP address change for networking app or the number of CPUs ++ or -- for
such beasts like JVM) and would like to let crtools be not so strict when
checking the environment
d) app may want to know that the migration took place at all
e) app may need to find out how much time was it suspended
f) app may need to know what has changed in the case "c" above
Andrey, Alex, can dbus handle all of the above?
Thanks,
Pavel
>> So it gets to the simple way.
>>
>> Suspension:
>>
>> 1. Check, if each process in the tree (starting from user specified
>> process) has "START SUSPEND" callback. If not, postpone suspension until
>> all processes having callbacks are halted (step 3).
>> 2. For each process with the callback:
>> 2A. Flag the process as "IN SUSPEND"
>> 2B. Enter "START SUSPEND" callback. Return code from "START SUSPEND" may
>> specify, if the process wants to do some housekeeping, or be postponed to
>> the step 3.
>> 2C. Wait for the "COMPLETE" callback to be called by the process. On
>> "COMPLETE" call, execution-halt the process.
>> 2D. Processes that have not responded in given time (user-specified) get
>> to the step 3.
>> 3. Execution-halt and suspend each process postponed in steps 1 or 2D.
>>
>> Unsuspension:
>>
>> 1. For all the processes:
>> 1A. Restore process, and unmark its "IN SUSPEND" mode.
>> 1B. Execution-start them. This returns from "COMPLETE" call on all the
>> processes inside "COMPLETE" callback.
>>
>> "COMPLETE" callbacks from processes not marked as "IN SUSPEND" should be
>> ignored. Such calls may be made by processes timed out to housekeep in
>> step 2D, and suspended while in housekeeping.
>>
>>
>> --
>> Regards,
>> Alexey Asemov
>> _______________________________________________
>> CRIU mailing list
>> CRIU at openvz.org
>> https://openvz.org/mailman/listinfo/criu
> _______________________________________________
> CRIU mailing list
> CRIU at openvz.org
> https://openvz.org/mailman/listinfo/criu
> .
>
More information about the CRIU
mailing list