Pipeline Execution

group_run.deploy_array_job(args, logger, time=None, label=None, group_len=None)

Configure a single array job for deployment.

Parameters:
  • args – (obj) Set of command line arguments supplied by argparse.

  • logger – (obj) Logging object for message logging.

  • time – (str) Time specified by the current allocation/band

  • label – (str) Label to apply to the current allocation/band

  • group_len – (int) Integer size of allocation/band group.

Returns:

None

group_run.get_group_len(workdir, group, repeat_id='main') int

Implement parallel reads from single ‘group’ file

Parameters:
  • workdir – (str) The path of the current pipeline working directory.

  • group – (str) The name of the dataset group within the pipeline.

  • repeat_id – (int) Repeat-id subset within the group, default is main.

Returns:

(int) The number of projects within the specified subset of a group of datasets.

group_run.main(args) None

Assemble sbatch script for parallel running jobs and execute. May include allocation of multiple tasks to each job if enabled.

Parameters:

args – (Object) ArgParse object containing all required parameters from default values or specific inputs from command-line.

Returns:

None

single_run.assemble_single_process(args, logger=None, jobid='', fh=None, logid=None) None

Process a single task and assemble required parameters. This task may sit within a subset, repeat id or larger group, but everything from here is concerned with the processing of a single dataset (task).

Parameters:
  • args – (obj) Set of command line arguments supplied by argparse.

  • logger – (obj) Logging object for info/debug/error messages. Will create a new logger object if not given one.

  • jobid – (str) From SLURM_ARRAY_JOB_ID - matters for which log files are created.

  • fh – (str) Path to file for logger I/O when defining new logger.

  • logid – (str) If creating a new logger, will need an id to distinguish this logger from other single processes (typically n of N total processes.)

Returns:

None

single_run.blacklisted(proj_code: str, groupdir: str, logger) bool

Determine if the current project code is blacklisted

Parameters:
  • groupdir – (str) The path to a group directory within the pipeline

  • proj_code – (str) The project code in string format (DOI)

  • logger – (obj) Logging object for info/debug/error messages.

Returns:

True if the project code is in the blacklist, false otherwise.

single_run.get_proj_code(workdir: str, group: str, pid, repeat_id, subset=0, id=0) str

Get the correct code given a slurm id from a group of project codes

Parameters:
  • workdir – (str) The current pipeline working directory.

  • group – (str) The name of the group which this project code belongs to.

  • pid – (str) The project code for which to get the index.

  • repeat_id – (str) The subset within the group (default is main)

  • subset – (int) The size of the subset within this repeat group.

  • id – (int) The specific index of this subset within a group. i.e subset size of 100, total codes is 1000 so 10 codes per subset. an id value of 2 would mean the third group of 10 codes.

Returns:

The project code (DOI) in string format not index format.

single_run.main(args) None

Main function for processing a single job. This could be multiple tasks/datasets within a single job, but everything from here is serialised, i.e run one after another.

Parameters:

args – (obj) Set of command line arguments supplied by argparse.

Returns:

None

single_run.run_compute(args, logger, fh=None, logid=None, **kwargs) None

Setup computation parameters for individual dataset

Params args:

(obj) Set of command line arguments supplied by argparse.

Params logger:

(obj) Logging object for info/debug/error messages.

Params fh:

(str) Path to file for logger I/O when defining new logger.

Params logid:

(str) Passed to KerchunkDSProcessor for specifying a logger component.

Returns:

None

single_run.run_init(args, logger, fh=None, **kwargs) None

Start initialisation for single dataset

Parameters:
  • args – (obj) Set of command line arguments supplied by argparse.

  • logger – (obj) Logging object for info/debug/error messages.

  • fh – (str) Path to file for logger I/O when defining new logger.

Returns:

None

single_run.run_scan(args, logger, fh=None, **kwargs) None

Start scanning process for individual dataset

Parameters:
  • args – (obj) Set of command line arguments supplied by argparse.

  • logger – (obj) Logging object for info/debug/error messages.

  • fh – (str) Path to file for logger I/O when defining new logger.

Returns:

None

single_run.run_validation(args, logger, fh=None, **kwargs) None

Start validation of single dataset.

Parameters:
  • args – (obj) Set of command line arguments supplied by argparse.

  • logger – (obj) Logging object for info/debug/error messages.

  • fh – (str) Path to file for logger I/O when defining new logger.

Returns:

None