<div dir="ltr"><div>OK.</div><div>Thank you so much for all informations.</div><div>Best regards. <br></div></div><div class="gmail_extra"><br><div class="gmail_quote">2018-06-24 17:28 GMT+01:00 Adrian Reber <span dir="ltr"><<a href="mailto:adrian@lisas.de" target="_blank">adrian@lisas.de</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Sun, Jun 24, 2018 at 02:05:46PM +0100, Thouraya TH wrote:<br>
> 2018-06-23 19:49 GMT+01:00 Thouraya TH <<a href="mailto:thouraya87@gmail.com">thouraya87@gmail.com</a>>:<br>
> <br>
> > 2018-06-23 18:44 GMT+01:00 Adrian Reber <<a href="mailto:adrian@lisas.de">adrian@lisas.de</a>>:<br>
> ><br>
> >> On Sat, Jun 23, 2018 at 02:11:10PM +0100, Thouraya TH wrote:<br>
> >> > *MPI applications would be to be aware of the communication that is<br>
> >> going<br>
> >> > on and try to restore that communication state after the process<br>
> >> restore. *<br>
> >> ><br>
> >> > This is about MPI library <a href="https://www.open-mpi.org/" rel="noreferrer" target="_blank">https://www.open-mpi.org/</a><br>
> >> > 1) Running HPC applications, in containers, is gaining significant<br>
> >> interest<br>
> >> > due to lighweight virtualisation of containers versus VMs (as i know).<br>
> >><br>
> >> Not sure if you are actually asking something here or not. The thing you<br>
> >> need to be concerned about are communication messages which are on the<br>
> >> fly. Especially if the underlying technology is reliable and does not<br>
> >> lose any messages. This can lead to a situation like this<br>
> >> (theoretically):<br>
> >><br>
> >> * Side A sends message M1 to side B<br>
> >> * Side B is checkpointed and stopped before receiving message M1<br>
> >> * Side A waits for an answer from side B (message M2)<br>
> >> * Network discards the message M1 as receiver on side B is gone<br>
> >> * Side B is restored and waits for ever for message M1<br>
> >><br>
> >> * Side B keeps now waiting for message M1<br>
> >> * Side A keeps waiting for message M2<br>
> >><br>
> >> And now both sides are waiting for ever and you would need to replay<br>
> >> message M1.<br>
> >><br>
> >> So coordinated checkpointing, where you can make sure that no messages<br>
> >> are currently on the fly, would make it easier.<br>
> >><br>
> ><br>
</div></div>> > *Ok, we can use coordinated or uncoordinated checkpointing using CR*IU ?<br>
> ><br>
> OpenMPI<br>
> <br>
> Status: stalled<br>
> <br>
> - Adrian Reber did <<a href="https://lisas.de/~adrian/open-mpi.git/" rel="noreferrer" target="_blank">https://lisas.de/~adrian/<wbr>open-mpi.git/</a>> first<br>
<span class="">> version of patches<br>
> <br>
> I see that this is your version. Is it a coordinated or uncoordinated<br>
> checkpointing ? Kind regards.<br>
<br>
</span>Actually I do not remember how and what Open MPI does in regards to<br>
checkpointing. I am guessing it is uncoordinated.<br>
<br>
At this point in time, if you want to use CRIU and one of the MPI<br>
variants, you have to implement it yourself.<br>
<span class="HOEnZb"><font color="#888888"><br>
Adrian<br>
</font></span></blockquote></div><br></div>