[CRIU] New CRIU plugin for Slurm

Manuel Rodríguez Pascual manuel.rodriguez.pascual at gmail.com
Thu Sep 1 01:33:22 PDT 2016


Well, this may be a bit complex, as you have to install Slurm from the
sources, and in particular from the github I pointed before (where
this plugin is stored). Having it ready and running might be a bit
painstaking the first time.

When compiling Slurm there are two options: install it on a cluster
(more than one node) or on a single node. This is simpler because you
don't have to install any shared storage and some other things, so may
probably be the best option for you. It is performed with
"-enable-multiple-slurmd" flag.

I have my installation notes here:
https://github.com/ciemat-tic/codec/wiki/Slurm-cluster  . There are
however probably incomplete and a bit outdated, but might help you in
the process. Just keep in mind that for this project "master" and
"computer node" would be the same machine, an that you don't have to
install MPI, database nor NFS.

After that is ready, you can submit a serial task to slurm, checkpoint
it with "scontrol checkpoint create <task id>", and restart it with
"scontrol checkpoint restart <task id>".

If you decide to give this a try, feel free to write me off-list
whenever you get stuck on anything, as setting up a Slurm cluster may
seem straightforward but is in fact quite tricky sometimes. Or if you
need anything else, just let me know.

best regards,

Manuel


2016-08-31 16:34 GMT+02:00 Pavel Emelyanov <xemul at virtuozzo.com>:
> On 08/30/2016 03:16 PM, Manuel Rodríguez Pascual wrote:
>> Hi all,
>>
>> After working together with CRIU developers, my team at CIEMAT has
>> developed a CRIU plugin from Slurm workload manager
>> (http://slurm.schedmd.com/slurm.html). This way, Slurm can employ this
>> checkpoint/restart library to perform these operations.
>>
>> It is stored in my personal github account,
>> https://github.com/supermanue/slurm/tree/criuPlugin , as a branch of a
>> fairly new Slurm version.
>>
>> Regarding the code, it is basically a clone of BLCR plugin modified to
>> CRIU requirements and functionality. It comprises:
>>
>> - the plugin itself, stored in src/plugins/checkpoint/criu
>> - a new "--with-criu" compilation flag (plus the related files) so a
>> user can specify criu location if it is not the default one
>> - a modification in the SPANK behaviour (spank.h and plugstack.c) so a
>> spank plugin can get the location of the Slurm checkpoint folder
>> calling spank_get_item with "S_CHECKPOINT_DIR"
>> - some minor changes in other compilation-related files
>>
>> We hope that this can be useful for the Slurm community.  Feel free to
>> test and use it :) And of course, any feedback (comment, criticism) is
>> welcome.
>
> Awesome! Thanks a lot for this work!
>
> Would you point us to some HOWTO describing the simplest way to check
> how Slurm and C/R functionality work together?
>
> -- Pavel



More information about the CRIU mailing list