<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /></head><body style='font-size: 10pt; font-family: Verdana,Geneva,sans-serif'>
<p>Hello Alex,</p>
<p>Thanks again for your time and response. It is now all clear to me.</p>
<p>I am amazed at the work put in to ensure CRIU does everything right.</p>
<p><br /></p>
<p>Best regards,</p>
<p>Dorian Goepp</p>
<p>Le 2022-07-07 16:54, Alexander Mikhalitsyn a écrit :</p>
<blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0"><!-- html ignored --><!-- head ignored --><!-- meta ignored -->
<div class="pre" style="margin: 0; padding: 0; font-family: monospace">Hello, Dorian!<br /> <br />
<blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0">There is one thing I would just like to be sure to have understood properly from your message. Do you mean that the process will redo the futex_wait system calls on restoration ?</blockquote>
yes.<br /> <br /> More detailed:<br /> when CRIU comes to the dump process it uses ptrace() to "seize" them.<br /> This procedure acts as a signal on the processes, so all (almost)<br /> syscalls which was executed by the collectable<br /> processes at the moment of dump get interrupted.<br /> You can observe that behaviour if you write a "buggy" program that<br /> incorrectly uses the sleep() call and does not handle EINTR. In this<br /> case the program will sleep less than it has to.<br /> But for futex() syscall handling is different, CRIU will restart<br /> syscall for you because from the kernel side futex will return<br /> -ERESTARTSYS which is handled by CRIU in compel_get_task_regs and<br /> leads to "automatic" syscall restart.<br /> In case of nanosleep syscall the kernel will return<br /> -ERESTART_RESTARTBLOCK which means that kernel should not perform the<br /> autorestart for this syscall, so in this case CRIU will "fixup" the<br /> "ax" register to -EINTR value to be fully transparent to mimic the<br /> generic kernel behavior for this syscalls group.<br /> <br /> General idea here is to be fully invisible to the userspace. If<br /> syscall has to return EINTR on signal, then CRIU will do the same, if<br /> syscall has to be restarted after execution of a signal handler then<br /> CRIU<br /> will restore the process with the syscall restarted.<br /> <br /> I've omitted details about the SA_RESTART flag, it's not so important<br /> to understand the basic idea here :)<br /> You can refer to the handle_signal() kernel function to get a better<br /> understanding of the details.<br /> <br /> References:<br /> <a href="https://github.com/torvalds/linux/blob/8cb1ae19bfae92def42c985417cd6e894ddaa047/kernel/futex/waitwake.c#L670">https://github.com/torvalds/linux/blob/8cb1ae19bfae92def42c985417cd6e894ddaa047/kernel/futex/waitwake.c#L670</a><br /> <a href="https://github.com/torvalds/linux/blob/8cb1ae19bfae92def42c985417cd6e894ddaa047/kernel/futex/waitwake.c#L552">https://github.com/torvalds/linux/blob/8cb1ae19bfae92def42c985417cd6e894ddaa047/kernel/futex/waitwake.c#L552</a><br /> <a href="https://github.com/torvalds/linux/blob/d6ecaa0024485effd065124fe774de2e22095f2d/arch/x86/kernel/signal.c#L796">https://github.com/torvalds/linux/blob/d6ecaa0024485effd065124fe774de2e22095f2d/arch/x86/kernel/signal.c#L796</a><br /> signal(7) man<br /> <br /> Best regards,<br /> Alex<br /> <br /> On Thu, Jul 7, 2022 at 5:11 PM Dorian Goepp <<a href="mailto:goepp@i3s.unice.fr">goepp@i3s.unice.fr</a>> wrote:
<blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0"><br /> Hello,<br /> <br /> <br /> Thanks a lot Alex for your response.<br /> <br /> There is one thing I would just like to be sure to have understood properly from your message. Do you mean that the process will redo the futex_wait system calls on restoration ?<br /> <br /> Best regards,<br /> <br /> Dorian Goepp<br /> <br /> Le 2022-07-07 14:34, Alexander Mikhalitsyn a écrit :<br /> <br /> Hello, dear friends,<br /> <br /> Yep, CRIU definitely handles futexes carefully. It's one of the most<br /> important things.<br /> We support both robust and regular futexes.<br /> <br /> As far as you know futexes work only on a shared memory basis, so for<br /> non-robust futex<br /> there is no need to have any special handling. We are just dumping the<br /> whole process memory contents.<br /> So after the restore we just need to ensure that threads instruction<br /> pointers (IP) are properly set<br /> (for instance we need to perform manual "syscall restart" for futexes<br /> (see compel_get_task_regs()).<br /> <br /> For robust futexes we have a special handling here:<br /> <a href="https://github.com/checkpoint-restore/criu/blob/c8f9880adab038481f7806173b698fc6e17ba76a/criu/cr-dump.c#L565">https://github.com/checkpoint-restore/criu/blob/c8f9880adab038481f7806173b698fc6e17ba76a/criu/cr-dump.c#L565</a><br /> <a href="https://github.com/checkpoint-restore/criu/blob/c8f9880adab038481f7806173b698fc6e17ba76a/criu/pie/restorer.c#L532">https://github.com/checkpoint-restore/criu/blob/c8f9880adab038481f7806173b698fc6e17ba76a/criu/pie/restorer.c#L532</a><br /> <br /> Regards,<br /> Alex<br /> <br /> On Thu, Jul 7, 2022 at 3:13 PM Adrian Reber <<a href="mailto:adrian@lisas.de">adrian@lisas.de</a>> wrote:<br /> <br /> <br /> Please try to submit your question as a github issue. Much higher<br /> chances of getting an answer there.<br /> <br /> Adrian<br /> <br /> On Thu, Jul 07, 2022 at 01:39:17PM +0200, Dorian Goepp wrote:<br /> <br /> Hi,<br /> <br /> Is this the right place to ask questions about CRIU's features and<br /> internals?<br /> <br /> If so, I have been considering CRIU for dumping and restoring a process<br /> with a set of threads synchronised through futexes [<a href="https://man7.org/linux/man-pages/man2/futex.2.html">1</a>]. It seems to<br /> work, but, I cannot tell from the documentation (wiki) whether it is<br /> officially supported, or just accidentally works for the cases I tested<br /> it with.<br /> <br /> I could not find which parts would take care of the state of the<br /> futex_wait system call in CRIU's source code, except maybe for<br /> `get_task_regs()` in crui/compel/arch/x86/src/lib/infect.c (I run it on<br /> an amd64 processor). This function issues the warning "Will restore %d<br /> with interrupted system call" when I dump the futex-heavy process. Is it<br /> enough to save the process's register state to resume the futex system<br /> call correctly ?<br /> <br /> Best regards,<br /> <br /> Dorian Goepp<br /> <br /> Links:<br /> ------<br /> [1] <a href="https://man7.org/linux/man-pages/man2/futex.2.html">https://man7.org/linux/man-pages/man2/futex.2.html</a><br /> <br /> <br /> _______________________________________________<br /> CRIU mailing list<br /> <a href="mailto:CRIU@openvz.org">CRIU@openvz.org</a><br /> <a href="https://lists.openvz.org/mailman/listinfo/criu">https://lists.openvz.org/mailman/listinfo/criu</a><br /> <br /> _______________________________________________<br /> CRIU mailing list<br /> <a href="mailto:CRIU@openvz.org">CRIU@openvz.org</a><br /> <a href="https://lists.openvz.org/mailman/listinfo/criu">https://lists.openvz.org/mailman/listinfo/criu</a></blockquote>
</div>
</blockquote>
</body></html>