Title: | Submit R Calculations to a 'Slurm' Cluster |
---|---|
Description: | Functions that simplify submitting R scripts to a 'Slurm' workload manager, in part by automating the division of embarrassingly parallel calculations across cluster nodes. |
Authors: | Philippe Marchand [aut], Ian Carroll [aut], Mike Smorul [aut], Rachael Blake [aut], Quentin Read [aut], Dayne Filer [ctb], Ben Fasoli [ctb], Pol van Rijn [ctb], Sebastian Schubert [ctb], Rob Gilmore [ctb], Christopher Barrington [ctb], Se Jong Cho [art], Erick Verleye [cre] |
Maintainer: | Erick Verleye <[email protected]> |
License: | GPL-3 |
Version: | 0.6.2 |
Built: | 2025-02-14 05:42:19 UTC |
Source: | https://github.com/earthlab/rslurm |
rslurm
PackageSend long-running or parallel jobs to a Slurm workload manager (i.e. cluster)
using the slurm_call
, slurm_apply
, or
slurm_map
functions.
This package includes three core functions used to send computations to a
Slurm cluster: 1) slurm_call
executes a function using a
single set of parameters (passed as a list), 2) slurm_apply
evaluates a function in parallel for each row of parameters in a given
data frame, and 3) slurm_map
evaluates a function in parallel
for each element of a list.
The functions slurm_apply
and slurm_map
automatically split the
parameter rows or list elements into equal-size chunks,
each chunk to be processed by a separate cluster node.
They use functions from the parallel-package
package to parallelize computations across processors on a given node.
The output of slurm_apply
, slurm_map
, or slurm_call
is a slurm_job
object that serves as an input to the other functions in the package:
print_job_status
, cancel_slurm
,
get_slurm_out
and cleanup_files
.
To be compatible with slurm_apply
, a function may accept any
number of single value parameters. The names of these parameters must match
the column names of the params
data frame supplied. There are no
restrictions on the types of parameters passed as a list to
slurm_call
or slurm_map
If the function passed to slurm_call
or slurm_apply
requires
knowledge of any R objects (data, custom helper functions) besides
params
, a character vector corresponding to their names should be
passed to the optional global_objects
argument.
When parallelizing a function, since any error will interrupt all
calculations for the current node, it may be useful to wrap expressions
which may generate errors into a try
or
tryCatch
function. This will ensure the computation
continues with the next parameter set after reporting the error.
The default output format for get_slurm_out
(outtype = "raw"
)
is a list where each element is the return value of one function call. If
the function passed to slurm_apply
produces a vector output, you may
use outtype = "table"
to collect the output in a single data frame,
with one row by function call.
Advanced options for the Slurm workload manager may accompany job submission
by slurm_call
, slurm_map
, and slurm_apply
through the optional slurm_options
argument. For example, passing
list(time = '1:30:00')
for this options limits the job to 1 hour and 30
minutes. Some advanced configuration must be set through environment
variables. On a multi-cluster head node, for example, the SLURM_CLUSTERS
environment variable must be set to direct jobs to a non-default cluster.
## Not run: # Create a data frame of mean/sd values for normal distributions pars <- data.frame(par_m = seq(-10, 10, length.out = 1000), par_sd = seq(0.1, 10, length.out = 1000)) # Create a function to parallelize ftest <- function(par_m, par_sd) { samp <- rnorm(10^7, par_m, par_sd) c(s_m = mean(samp), s_sd = sd(samp)) } sjob1 <- slurm_apply(ftest, pars) print_job_status(sjob1) res <- get_slurm_out(sjob1, "table") all.equal(pars, res) # Confirm correct output cleanup_files(sjob1) ## End(Not run)
## Not run: # Create a data frame of mean/sd values for normal distributions pars <- data.frame(par_m = seq(-10, 10, length.out = 1000), par_sd = seq(0.1, 10, length.out = 1000)) # Create a function to parallelize ftest <- function(par_m, par_sd) { samp <- rnorm(10^7, par_m, par_sd) c(s_m = mean(samp), s_sd = sd(samp)) } sjob1 <- slurm_apply(ftest, pars) print_job_status(sjob1) res <- get_slurm_out(sjob1, "table") all.equal(pars, res) # Confirm correct output cleanup_files(sjob1) ## End(Not run)
This function cancels the specified Slurm job by invoking the Slurm
scancel
command. It does not delete the temporary files
(e.g. scripts) created by slurm_apply
or
slurm_call
. Use cleanup_files
to remove those
files.
cancel_slurm(slr_job)
cancel_slurm(slr_job)
slr_job |
A |
This function deletes all temporary files associated with the specified Slurm
job, including files created by slurm_apply
or
slurm_call
, as well as outputs from the cluster. These files
should be located in the _rslurm_[jobname] folder of the current
working directory.
cleanup_files(slr_job, wait = TRUE)
cleanup_files(slr_job, wait = TRUE)
slr_job |
A |
wait |
Specify whether to block until |
## Not run: sjob <- slurm_apply(func, pars) print_job_status(sjob) # Prints console/error output once job is completed. func_result <- get_slurm_out(sjob, "table") # Loads output data into R. cleanup_files(sjob) ## End(Not run)
## Not run: sjob <- slurm_apply(func, pars) print_job_status(sjob) # Prints console/error output once job is completed. func_result <- get_slurm_out(sjob, "table") # Loads output data into R. cleanup_files(sjob) ## End(Not run)
This function returns the completion status of a Slurm job, its queue status if any and log outputs.
get_job_status(slr_job)
get_job_status(slr_job)
slr_job |
A |
The queue
element of the output is a data frame matching the output
of the Slurm squeue
command for that job; it will only indicate portions
of job that are running or in queue. The log
element is a
vector of the contents of console/error output files for each node where the
job is running.
A list with three elements: completed
is a logical value
indicating if all portions of the job have completed or stopped, queue
contains the information on job elements still in queue, and
log
contains the console/error logs.
This function reads all function output files (one by cluster node used) from
the specified Slurm job and returns the result in a single data frame
(if "table" format selected) or list (if "raw" format selected). It doesn't
record any messages (including warnings or errors) output to the R console
during the computation; these can be consulted by invoking
print_job_status
.
get_slurm_out(slr_job, outtype = "raw", wait = TRUE, ncores = NULL)
get_slurm_out(slr_job, outtype = "raw", wait = TRUE, ncores = NULL)
slr_job |
A |
outtype |
Can be "table" or "raw", see "Value" below for details. |
wait |
Specify whether to block until |
ncores |
(optional) If not null, the number of cores passed to mclapply |
The outtype
option is only relevant for jobs submitted with
slurm_apply
. Jobs sent with slurm_call
only return a single
object, and setting outtype = "table"
creates an error in that case.
If outtype = "table"
: A data frame with one column by
return value of the function passed to slurm_apply
, where
each row is the output of the corresponding row in the params data frame
passed to slurm_apply
.
If outtype = "raw"
: A list where each element is the output
of the function passed to slurm_apply
for the corresponding
row in the params data frame passed to slurm_apply
.
Run a previously created slurm_job
object locally instead of on a
Slurm cluster
local_slurm_array(slr_job, rscript_path = NULL)
local_slurm_array(slr_job, rscript_path = NULL)
slr_job |
An object of class |
rscript_path |
The location of the Rscript command. If not specified, defaults to the location of Rscript within the R installation being run. |
This function is most useful for testing your function on a reduced dataset before submitting the full job to the Slurm cluster.
Call local_slurm_array
on a slurm_job
object created with
slurm_apply(..., submit = FALSE)
or slurm_map(..., submit = FALSE)
.
The job will run serially on the local system rather than being submitted
to the Slurm cluster.
## Not run: sjob <- slurm_apply(func, pars, submit = FALSE) local_slurm_array(sjob) func_result <- get_slurm_out(sjob, "table") # Loads output data into R. cleanup_files(sjob) ## End(Not run)
## Not run: sjob <- slurm_apply(func, pars, submit = FALSE) local_slurm_array(sjob) func_result <- get_slurm_out(sjob, "table") # Loads output data into R. cleanup_files(sjob) ## End(Not run)
Use slurm_apply
to compute function over multiple sets of
parameters in parallel, spread across multiple nodes of a Slurm cluster,
with similar syntax to mapply
.
slurm_apply( f, params, ..., jobname = NA, nodes = 2, cpus_per_node = 2, processes_per_node = cpus_per_node, preschedule_cores = TRUE, job_array_task_limit = NULL, global_objects = NULL, add_objects = NULL, pkgs = rev(.packages()), libPaths = NULL, rscript_path = NULL, r_template = NULL, sh_template = NULL, slurm_options = list(), submit = TRUE )
slurm_apply( f, params, ..., jobname = NA, nodes = 2, cpus_per_node = 2, processes_per_node = cpus_per_node, preschedule_cores = TRUE, job_array_task_limit = NULL, global_objects = NULL, add_objects = NULL, pkgs = rev(.packages()), libPaths = NULL, rscript_path = NULL, r_template = NULL, sh_template = NULL, slurm_options = list(), submit = TRUE )
f |
A function that accepts one or many single values as parameters and may return any type of R object. |
params |
A data frame of parameter values to apply |
... |
Additional arguments to |
jobname |
The name of the Slurm job; if |
nodes |
The (maximum) number of cluster nodes to spread the calculation
over. |
cpus_per_node |
The number of CPUs requested per node. This argument is
mapped to the Slurm parameter |
processes_per_node |
The number of logical CPUs to utilize per node,
i.e. how many processes to run in parallel per node. This can exceed
|
preschedule_cores |
Corresponds to the |
job_array_task_limit |
The maximum number of job array tasks to run at
the same time. Defaults to |
global_objects |
A character vector containing the name of R objects to be
saved in a .RData file and loaded on each cluster node prior to calling
|
add_objects |
Older deprecated name of |
pkgs |
A character vector containing the names of packages that must
be loaded on each cluster node. By default, it includes all packages
loaded by the user when |
libPaths |
A character vector describing the location of additional R
library trees to search through, or NULL. The default value of NULL
corresponds to libraries returned by |
rscript_path |
The location of the Rscript command. If not specified, defaults to the location of Rscript within the R installation being run. |
r_template |
The path to the template file for the R script run on each node. If NULL, uses the default template "rslurm/templates/slurm_run_R.txt". |
sh_template |
The path to the template file for the sbatch submission script. If NULL, uses the default template "rslurm/templates/submit_sh.txt". |
slurm_options |
A named list of options recognized by |
submit |
Whether or not to submit the job to the cluster with
|
This function creates a temporary folder ("_rslurm_[jobname]") in the current directory, holding .RData and .RDS data files, the R script to run and the Bash submission script generated for the Slurm job.
The set of input parameters is divided in equal chunks sent to each node, and
f
is evaluated in parallel within each node using functions from the
parallel
R package. The names of any other R objects (besides
params
) that f
needs to access should be included in
global_objects
or passed as additional arguments through ...
.
Use slurm_options
to set any option recognized by sbatch
, e.g.
slurm_options = list(time = "1:00:00", share = TRUE)
.
See http://slurm.schedmd.com/sbatch.html for details on possible options.
Note that full names must be used (e.g. "time" rather than "t") and that flags
(such as "share") must be specified as TRUE. The "array", "job-name", "nodes",
"cpus-per-task" and "output" options are already determined by
slurm_apply
and should not be manually set.
When processing the computation job, the Slurm cluster will output two types of files in the temporary folder: those containing the return values of the function for each subset of parameters ("results_[node_id].RDS") and those containing any console or error output produced by R on each node ("slurm_[node_id].out").
If submit = TRUE
, the job is sent to the cluster and a confirmation
message (or error) is output to the console. If submit = FALSE
,
a message indicates the location of the saved data and script files; the
job can be submitted manually by running the shell command
sbatch submit.sh
from that directory.
After sending the job to the Slurm cluster, slurm_apply
returns a
slurm_job
object which can be used to cancel the job, get the job
status or output, and delete the temporary files associated with it. See
the description of the related functions for more details.
A slurm_job
object containing the jobname
and the
number of nodes
effectively used.
slurm_call
to evaluate a single function call.
slurm_map
to evaluate a function over a list.
cancel_slurm
, cleanup_files
,
get_slurm_out
and get_job_status
which use the output of this function.
## Not run: sjob <- slurm_apply(func, pars) get_job_status(sjob) # Prints console/error output once job is completed. func_result <- get_slurm_out(sjob, "table") # Loads output data into R. cleanup_files(sjob) ## End(Not run)
## Not run: sjob <- slurm_apply(func, pars) get_job_status(sjob) # Prints console/error output once job is completed. func_result <- get_slurm_out(sjob, "table") # Loads output data into R. cleanup_files(sjob) ## End(Not run)
Use slurm_call
to perform a single function evaluation a the Slurm
cluster.
slurm_call( f, params = list(), jobname = NA, global_objects = NULL, add_objects = NULL, pkgs = rev(.packages()), libPaths = NULL, rscript_path = NULL, r_template = NULL, sh_template = NULL, slurm_options = list(), submit = TRUE )
slurm_call( f, params = list(), jobname = NA, global_objects = NULL, add_objects = NULL, pkgs = rev(.packages()), libPaths = NULL, rscript_path = NULL, r_template = NULL, sh_template = NULL, slurm_options = list(), submit = TRUE )
f |
Any R function. |
params |
A named list of parameters to pass to |
jobname |
The name of the Slurm job; if |
global_objects |
A character vector containing the name of R objects to be
saved in a .RData file and loaded on each cluster node prior to calling
|
add_objects |
Older deprecated name of |
pkgs |
A character vector containing the names of packages that must be
loaded on each cluster node. By default, it includes all packages loaded by
the user when |
libPaths |
A character vector describing the location of additional R
library trees to search through, or NULL. The default value of NULL
corresponds to libraries returned by |
rscript_path |
The location of the Rscript command. If not specified, defaults to the location of Rscript within the R installation being run. |
r_template |
The path to the template file for the R script run on each node. If NULL, uses the default template "rslurm/templates/slurm_run_single_R.txt". |
sh_template |
The path to the template file for the sbatch submission script. If NULL, uses the default template "rslurm/templates/submit_single_sh.txt". |
slurm_options |
A named list of options recognized by |
submit |
Whether or not to submit the job to the cluster with
|
This function creates a temporary folder ("_rslurm_[jobname]") in the current directory, holding .RData and .RDS data files, the R script to run and the Bash submission script generated for the Slurm job.
The names of any other R objects (besides params
) that f
needs
to access should be listed in the global_objects
argument.
Use slurm_options
to set any option recognized by sbatch
, e.g.
slurm_options = list(time = "1:00:00", share = TRUE)
. See
http://slurm.schedmd.com/sbatch.html for details on possible options.
Note that full names must be used (e.g. "time" rather than "t") and that
flags (such as "share") must be specified as TRUE. The "job-name", "ntasks"
and "output" options are already determined by slurm_call
and should
not be manually set.
When processing the computation job, the Slurm cluster will output two files in the temporary folder: one with the return value of the function ("results_0.RDS") and one containing any console or error output produced by R ("slurm_[node_id].out").
If submit = TRUE
, the job is sent to the cluster and a confirmation
message (or error) is output to the console. If submit = FALSE
, a
message indicates the location of the saved data and script files; the job
can be submitted manually by running the shell command sbatch
submit.sh
from that directory.
After sending the job to the Slurm cluster, slurm_call
returns a
slurm_job
object which can be used to cancel the job, get the job
status or output, and delete the temporary files associated with it. See the
description of the related functions for more details.
A slurm_job
object containing the jobname
and the number
of nodes
effectively used.
slurm_apply
to parallelize a function over a parameter
set.
cancel_slurm
, cleanup_files
,
get_slurm_out
and get_job_status
which use
the output of this function.
This function creates a slurm_job
object which can be passed to other
functions such as cancel_slurm
, cleanup_files
,
get_slurm_out
and get_job_status
.
slurm_job(jobname = NULL, jobid = NULL, nodes = NULL)
slurm_job(jobname = NULL, jobid = NULL, nodes = NULL)
jobname |
The name of the Slurm job. The rslurm-generated scripts and output files associated with a job should be found in the _rslurm_[jobname] folder. |
jobid |
The id of the Slurm job created by the sbatch command. |
nodes |
The number of cluster nodes used by that job. |
In general, slurm_job
objects are created automatically as the output of
slurm_apply
or slurm_call
, but it may be necessary
to manually recreate one if the job was submitted in a different R session.
A slurm_job
object.
Use slurm_map
to compute function over a list
in parallel, spread across multiple nodes of a Slurm cluster,
with similar syntax to lapply
.
slurm_map( x, f, ..., jobname = NA, nodes = 2, cpus_per_node = 2, processes_per_node = cpus_per_node, preschedule_cores = TRUE, job_array_task_limit = NULL, global_objects = NULL, pkgs = rev(.packages()), libPaths = NULL, rscript_path = NULL, r_template = NULL, sh_template = NULL, slurm_options = list(), submit = TRUE )
slurm_map( x, f, ..., jobname = NA, nodes = 2, cpus_per_node = 2, processes_per_node = cpus_per_node, preschedule_cores = TRUE, job_array_task_limit = NULL, global_objects = NULL, pkgs = rev(.packages()), libPaths = NULL, rscript_path = NULL, r_template = NULL, sh_template = NULL, slurm_options = list(), submit = TRUE )
x |
A list to apply |
f |
A function that accepts one element of |
... |
Additional arguments to |
jobname |
The name of the Slurm job; if |
nodes |
The (maximum) number of cluster nodes to spread the calculation
over. |
cpus_per_node |
The number of CPUs requested per node. This argument is
mapped to the Slurm parameter |
processes_per_node |
The number of logical CPUs to utilize per node,
i.e. how many processes to run in parallel per node. This can exceed
|
preschedule_cores |
Corresponds to the |
job_array_task_limit |
The maximum number of job array tasks to run at
the same time. Defaults to |
global_objects |
A character vector containing the name of R objects to be
saved in a .RData file and loaded on each cluster node prior to calling
|
pkgs |
A character vector containing the names of packages that must
be loaded on each cluster node. By default, it includes all packages
loaded by the user when |
libPaths |
A character vector describing the location of additional R
library trees to search through, or NULL. The default value of NULL
corresponds to libraries returned by |
rscript_path |
The location of the Rscript command. If not specified, defaults to the location of Rscript within the R installation being run. |
r_template |
The path to the template file for the R script run on each node. If NULL, uses the default template "rslurm/templates/slurm_run_R.txt". |
sh_template |
The path to the template file for the sbatch submission script. If NULL, uses the default template "rslurm/templates/submit_sh.txt". |
slurm_options |
A named list of options recognized by |
submit |
Whether or not to submit the job to the cluster with
|
This function creates a temporary folder ("_rslurm_[jobname]") in the current directory, holding .RData and .RDS data files, the R script to run and the Bash submission script generated for the Slurm job.
The set of input parameters is divided in equal chunks sent to each node, and
f
is evaluated in parallel within each node using functions from the
parallel
R package. The names of any other R objects (besides
x
) that f
needs to access should be included in
global_objects
or passed as additional arguments through ...
.
Use slurm_options
to set any option recognized by sbatch
, e.g.
slurm_options = list(time = "1:00:00", share = TRUE)
.
See http://slurm.schedmd.com/sbatch.html for details on possible options.
Note that full names must be used (e.g. "time" rather than "t") and that flags
(such as "share") must be specified as TRUE. The "array", "job-name", "nodes",
"cpus-per-task" and "output" options are already determined by
slurm_map
and should not be manually set.
When processing the computation job, the Slurm cluster will output two types of files in the temporary folder: those containing the return values of the function for each subset of parameters ("results_[node_id].RDS") and those containing any console or error output produced by R on each node ("slurm_[node_id].out").
If submit = TRUE
, the job is sent to the cluster and a confirmation
message (or error) is output to the console. If submit = FALSE
,
a message indicates the location of the saved data and script files; the
job can be submitted manually by running the shell command
sbatch submit.sh
from that directory.
After sending the job to the Slurm cluster, slurm_map
returns a
slurm_job
object which can be used to cancel the job, get the job
status or output, and delete the temporary files associated with it. See
the description of the related functions for more details.
A slurm_job
object containing the jobname
and the
number of nodes
effectively used.
slurm_call
to evaluate a single function call.
slurm_apply
to evaluate a function row-wise over a
data frame of parameters.
cancel_slurm
, cleanup_files
,
get_slurm_out
and get_job_status
which use the output of this function.
## Not run: sjob <- slurm_map(func, list) get_job_status(sjob) # Prints console/error output once job is completed. func_result <- get_slurm_out(sjob, "table") # Loads output data into R. cleanup_files(sjob) ## End(Not run)
## Not run: sjob <- slurm_map(func, list) get_job_status(sjob) # Prints console/error output once job is completed. func_result <- get_slurm_out(sjob, "table") # Loads output data into R. cleanup_files(sjob) ## End(Not run)