Pipeline Flags

BypassSwitch Options

Certain non-fatal errors may be bypassed using the Bypass flag:

Format: -b "DBSCR"

Default: "DBSCR" # Highlighted by a '*'

"D" - * Skip driver failures - Pipeline tries different options for NetCDF (default).
    -   Only need to turn this skip off if all drivers fail (KerchunkFatalDriverError).
"B" - * Skip Box compute errors.
"S" - * Skip Soft fails (NaN-only boxes in validation) (default).
"C" - * Skip calculation (data sum) errors (time array typically cannot be summed) (default).
"X" -   Skip initial shape errors, by attempting XKShape tolerance method (special case.)
"R" - * Skip reporting to status_log which becomes visible with assessor. Reporting is skipped
        by default in single_run.py but overridden when using group_run.py so any serial
        testing does not by default report the error experienced to the status log for that project.
"F" -   Skip scanning (fasttrack) and go straight to compute. Required if running compute before scan
        is attempted.

Single Dataset Operation

Run all single-dataset processes with the single-run.py script.

usage: single_run.py [-h] [-f] [-v] [-d] [-Q] [-B] [-A] [-w WORKDIR] [-g GROUPDIR] [-p PROJ_DIR]
                     [-t TIME_ALLOWED] [-G GROUPID] [-M MEMORY] [-s SUBSET]
                     [-r REPEAT_ID] [-b BYPASS] [-n NEW_VERSION] [-m MODE] [-O OVERRIDE_TYPE]
                     phase proj_code

Run a pipeline step for a single dataset

positional arguments:
  phase                 Phase of the pipeline to initiate
  proj_code             Project identifier code

options:
  -h, --help            show this help message and exit
  -f, --forceful        Force overwrite of steps if previously done
  -v, --verbose         Print helpful statements while running
  -d, --dryrun          Perform dry-run (i.e no new files/dirs created)
  -Q, --quality         Create refs from scratch (no loading), use all NetCDF files in validation
  -B, --backtrack       Backtrack to previous position, remove files that would be created in this job.
  -A, --alloc-bins      Use binpacking for allocations (otherwise will use banding)

  -w WORKDIR, --workdir WORKDIR
                        Working directory for pipeline
  -g GROUPDIR, --groupdir GROUPDIR
                        Group directory for pipeline
  -p PROJ_DIR, --proj_dir PROJ_DIR
                        Project directory for pipeline
  -t TIME_ALLOWED, --time-allowed TIME_ALLOWED
                        Time limit for this job
  -G GROUPID, --groupID GROUPID
                        Group identifier label
  -M MEMORY, --memory MEMORY
                        Memory allocation for this job (i.e "2G" for 2GB)
  -s SUBSET, --subset SUBSET
                        Size of subset within group
  -r REPEAT_ID, --repeat_id REPEAT_ID
                        Repeat id (1 if first time running, <phase>_<repeat> otherwise)
  -b BYPASS, --bypass-errs BYPASS
                        Bypass switch options: See Above

  -n NEW_VERSION, --new_version NEW_VERSION
                        If present, create a new version
  -m MODE, --mode MODE  Print or record information (log or std)
  -O OVERRIDE_TYPE, --override_type OVERRIDE_TYPE
                        Specify cloud-format output type, overrides any determination by pipeline.

Multi-Dataset Group Operation

Run all multi-dataset group processes within the pipeline using the group_run.py script.

usage: group_run.py [-h] [-S SOURCE] [-e VENVPATH] [-i INPUT] [-A] [--allow-band-increase] [-f] [-v] [-d] [-Q] [-b BYPASS] [-B] [-w WORKDIR] [-g GROUPDIR]
                    [-p PROJ_DIR] [-G GROUPID] [-t TIME_ALLOWED] [-M MEMORY] [-s SUBSET] [-r REPEAT_ID] [-n NEW_VERSION] [-m MODE]
                    phase groupID

Run a pipeline step for a group of datasets

positional arguments:
  phase                 Phase of the pipeline to initiate
  groupID               Group identifier code

options:
  -h, --help            show this help message and exit
  -S SOURCE, --source SOURCE
                        Path to directory containing master scripts (this one)
  -e VENVPATH, --environ VENVPATH
                        Path to virtual (e)nvironment (excludes /bin/activate)
  -i INPUT, --input INPUT
                        input file (for init phase)
  -A, --alloc-bins      input file (for init phase)

  --allow-band-increase
                        Allow automatic banding increase relative to previous runs.

  -f, --forceful        Force overwrite of steps if previously done
  -v, --verbose         Print helpful statements while running
  -d, --dryrun          Perform dry-run (i.e no new files/dirs created)
  -Q, --quality         Quality assured checks - thorough run

  -b BYPASS, --bypass-errs BYPASS
                        Bypass switch options: See Above

  -B, --backtrack       Backtrack to previous position, remove files that would be created in this job.
  -w WORKDIR, --workdir WORKDIR
                        Working directory for pipeline
  -g GROUPDIR, --groupdir GROUPDIR
                        Group directory for pipeline
  -p PROJ_DIR, --proj_dir PROJ_DIR
                        Project directory for pipeline
  -G GROUPID, --groupID GROUPID
                        Group identifier label
  -t TIME_ALLOWED, --time-allowed TIME_ALLOWED
                        Time limit for this job
  -M MEMORY, --memory MEMORY
                        Memory allocation for this job (i.e "2G" for 2GB)
  -s SUBSET, --subset SUBSET
                        Size of subset within group
  -r REPEAT_ID, --repeat_id REPEAT_ID
                        Repeat id (main if first time running, <phase>_<repeat> otherwise)
  -n NEW_VERSION, --new_version NEW_VERSION
                        If present, create a new version
  -m MODE, --mode MODE  Print or record information (log or std)

Assessor Tool Operation

Perform assessments of groups within the pipeline using the assess.py script.

usage: assess.py [-h] [-B] [-R REASON] [-s OPTION] [-c CLEANUP] [-U UPGRADE] [-l] [-j JOBID] [-p PHASE] [-r REPEAT_ID] [-n NEW_ID] [-N NUMBERS] [-e ERROR] [-E] [-W]
                 [-O] [-w WORKDIR] [-g GROUPDIR] [-v] [-m MODE]
                 operation groupID

Run a pipeline step for a single dataset

positional arguments:
  operation             Operation to perform - choose from ['progress', 'blacklist', 'upgrade', 'summarise', 'display', 'cleanup', 'match',
                        'status_log']
  groupID               Group identifier code for the group on which to operate.

options:
  -h, --help            show this help message and exit
  -B, --blacklist       Use when saving project codes to the blacklist

  -R REASON, --reason REASON
                        Provide the reason for handling project codes when saving to the blacklist or upgrading
  -s OPTION, --show-opts OPTION
                        Show options for jobids, labels, also used in matching and status_log.
  -c CLEANUP, --clean-up CLEANUP
                        Clean up group directory of errors/outputs/labels
  -U UPGRADE, --upgrade UPGRADE
                        Upgrade to new version
  -l, --long            Show long error message (no concatenation)
  -j JOBID, --jobid JOBID
                        Identifier of job to inspect
  -p PHASE, --phase PHASE
                        Pipeline phase to inspect
  -r REPEAT_ID, --repeat_id REPEAT_ID
                        Inspect an existing ID for errors
  -n NEW_ID, --new_id NEW_ID
                        Create a new repeat ID, specify selection of codes by phase, error etc.
  -N NUMBERS, --numbers NUMBERS
                        Show project code IDs for lists of codes less than the N value specified here.
  -e ERROR, --error ERROR
                        Inspect error of a specific type
  -E, --examine         Examine log outputs individually.
  -W, --write           Write outputs to files
  -O, --overwrite       Force overwrite of steps if previously done
  -w WORKDIR, --workdir WORKDIR
                        Working directory for pipeline
  -g GROUPDIR, --groupdir GROUPDIR
                        Group directory for pipeline
  -v, --verbose         Print helpful statements while running
  -m MODE, --mode MODE  Print or record information (log or std)