Pipeline Execution
- group_run.deploy_array_job(args, logger, time=None, label=None, group_len=None)
Configure a single array job for deployment.
- Parameters:
args – (obj) Set of command line arguments supplied by argparse.
logger – (obj) Logging object for message logging.
time – (str) Time specified by the current allocation/band
label – (str) Label to apply to the current allocation/band
group_len – (int) Integer size of allocation/band group.
- Returns:
None
- group_run.get_group_len(workdir, group, repeat_id='main') int
Implement parallel reads from single ‘group’ file
- Parameters:
workdir – (str) The path of the current pipeline working directory.
group – (str) The name of the dataset group within the pipeline.
repeat_id – (int) Repeat-id subset within the group, default is main.
- Returns:
(int) The number of projects within the specified subset of a group of datasets.
- group_run.main(args) None
Assemble sbatch script for parallel running jobs and execute. May include allocation of multiple tasks to each job if enabled.
- Parameters:
args – (Object) ArgParse object containing all required parameters from default values or specific inputs from command-line.
- Returns:
None
- single_run.assemble_single_process(args, logger=None, jobid='', fh=None, logid=None) None
Process a single task and assemble required parameters. This task may sit within a subset, repeat id or larger group, but everything from here is concerned with the processing of a single dataset (task).
- Parameters:
args – (obj) Set of command line arguments supplied by argparse.
logger – (obj) Logging object for info/debug/error messages. Will create a new logger object if not given one.
jobid – (str) From SLURM_ARRAY_JOB_ID - matters for which log files are created.
fh – (str) Path to file for logger I/O when defining new logger.
logid – (str) If creating a new logger, will need an id to distinguish this logger from other single processes (typically n of N total processes.)
- Returns:
None
- single_run.blacklisted(proj_code: str, groupdir: str, logger) bool
Determine if the current project code is blacklisted
- Parameters:
groupdir – (str) The path to a group directory within the pipeline
proj_code – (str) The project code in string format (DOI)
logger – (obj) Logging object for info/debug/error messages.
- Returns:
True if the project code is in the blacklist, false otherwise.
- single_run.get_proj_code(workdir: str, group: str, pid, repeat_id, subset=0, id=0) str
Get the correct code given a slurm id from a group of project codes
- Parameters:
workdir – (str) The current pipeline working directory.
group – (str) The name of the group which this project code belongs to.
pid – (str) The project code for which to get the index.
repeat_id – (str) The subset within the group (default is main)
subset – (int) The size of the subset within this repeat group.
id – (int) The specific index of this subset within a group. i.e subset size of 100, total codes is 1000 so 10 codes per subset. an id value of 2 would mean the third group of 10 codes.
- Returns:
The project code (DOI) in string format not index format.
- single_run.main(args) None
Main function for processing a single job. This could be multiple tasks/datasets within a single job, but everything from here is serialised, i.e run one after another.
- Parameters:
args – (obj) Set of command line arguments supplied by argparse.
- Returns:
None
- single_run.run_compute(args, logger, fh=None, logid=None, **kwargs) None
Setup computation parameters for individual dataset
- Params args:
(obj) Set of command line arguments supplied by argparse.
- Params logger:
(obj) Logging object for info/debug/error messages.
- Params fh:
(str) Path to file for logger I/O when defining new logger.
- Params logid:
(str) Passed to KerchunkDSProcessor for specifying a logger component.
- Returns:
None
- single_run.run_init(args, logger, fh=None, **kwargs) None
Start initialisation for single dataset
- Parameters:
args – (obj) Set of command line arguments supplied by argparse.
logger – (obj) Logging object for info/debug/error messages.
fh – (str) Path to file for logger I/O when defining new logger.
- Returns:
None
- single_run.run_scan(args, logger, fh=None, **kwargs) None
Start scanning process for individual dataset
- Parameters:
args – (obj) Set of command line arguments supplied by argparse.
logger – (obj) Logging object for info/debug/error messages.
fh – (str) Path to file for logger I/O when defining new logger.
- Returns:
None
- single_run.run_validation(args, logger, fh=None, **kwargs) None
Start validation of single dataset.
- Parameters:
args – (obj) Set of command line arguments supplied by argparse.
logger – (obj) Logging object for info/debug/error messages.
fh – (str) Path to file for logger I/O when defining new logger.
- Returns:
None