Pipeline Execution

group_run.deploy_array_job(args, logger, time=None, label=None, group_len=None)

Configure a single array job for deployment.

Parameters:

args – (obj) Set of command line arguments supplied by argparse.
logger – (obj) Logging object for message logging.
time – (str) Time specified by the current allocation/band
label – (str) Label to apply to the current allocation/band
group_len – (int) Integer size of allocation/band group.

Returns:

None

group_run.get_group_len(workdir, group, repeat_id='main') → int

Implement parallel reads from single ‘group’ file

Parameters:

workdir – (str) The path of the current pipeline working directory.
group – (str) The name of the dataset group within the pipeline.
repeat_id – (int) Repeat-id subset within the group, default is main.

Returns:

(int) The number of projects within the specified subset of a group of datasets.

group_run.main(args) → None

Assemble sbatch script for parallel running jobs and execute. May include allocation of multiple tasks to each job if enabled.

Parameters:: args – (Object) ArgParse object containing all required parameters from default values or specific inputs from command-line.
Returns:: None

single_run.assemble_single_process(args, logger=None, jobid='', fh=None, logid=None) → None

Process a single task and assemble required parameters. This task may sit within a subset, repeat id or larger group, but everything from here is concerned with the processing of a single dataset (task).

Parameters:

args – (obj) Set of command line arguments supplied by argparse.
logger – (obj) Logging object for info/debug/error messages. Will create a new logger object if not given one.
jobid – (str) From SLURM_ARRAY_JOB_ID - matters for which log files are created.
fh – (str) Path to file for logger I/O when defining new logger.
logid – (str) If creating a new logger, will need an id to distinguish this logger from other single processes (typically n of N total processes.)

Returns:

None

single_run.blacklisted(proj_code: str, groupdir: str, logger) → bool

Determine if the current project code is blacklisted

Parameters:

groupdir – (str) The path to a group directory within the pipeline
proj_code – (str) The project code in string format (DOI)
logger – (obj) Logging object for info/debug/error messages.

Returns:

True if the project code is in the blacklist, false otherwise.

single_run.get_proj_code(workdir: str, group: str, pid, repeat_id, subset=0, id=0) → str

Get the correct code given a slurm id from a group of project codes

Parameters:

workdir – (str) The current pipeline working directory.
group – (str) The name of the group which this project code belongs to.
pid – (str) The project code for which to get the index.
repeat_id – (str) The subset within the group (default is main)
subset – (int) The size of the subset within this repeat group.
id – (int) The specific index of this subset within a group. i.e subset size of 100, total codes is 1000 so 10 codes per subset. an id value of 2 would mean the third group of 10 codes.

Returns:

The project code (DOI) in string format not index format.

single_run.main(args) → None

Main function for processing a single job. This could be multiple tasks/datasets within a single job, but everything from here is serialised, i.e run one after another.

Parameters:: args – (obj) Set of command line arguments supplied by argparse.
Returns:: None

single_run.run_compute(args, logger, fh=None, logid=None, **kwargs) → None

Setup computation parameters for individual dataset

Params args:: (obj) Set of command line arguments supplied by argparse.
Params logger:: (obj) Logging object for info/debug/error messages.
Params fh:: (str) Path to file for logger I/O when defining new logger.
Params logid:: (str) Passed to KerchunkDSProcessor for specifying a logger component.
Returns:: None

single_run.run_init(args, logger, fh=None, **kwargs) → None

Start initialisation for single dataset

Parameters:

args – (obj) Set of command line arguments supplied by argparse.
logger – (obj) Logging object for info/debug/error messages.
fh – (str) Path to file for logger I/O when defining new logger.

Returns:

None

single_run.run_scan(args, logger, fh=None, **kwargs) → None

Start scanning process for individual dataset

Parameters:

args – (obj) Set of command line arguments supplied by argparse.
logger – (obj) Logging object for info/debug/error messages.
fh – (str) Path to file for logger I/O when defining new logger.

Returns:

None

single_run.run_validation(args, logger, fh=None, **kwargs) → None

Start validation of single dataset.

Parameters:

args – (obj) Set of command line arguments supplied by argparse.
logger – (obj) Logging object for info/debug/error messages.
fh – (str) Path to file for logger I/O when defining new logger.

Returns:

None