Padocc Utility Scripts

Utilities

class pipeline.utils.BypassSwitch(switch='DBSCLR')

Bases: object

Class to represent all bypass switches throughout the pipeline. Requires a switch string which is used to enable/disable specific pipeline switches stored in this class.

pipeline.utils.find_zarrays(refs: dict) dict

Quick way of extracting all the zarray components of a ref set.

pipeline.utils.format_str(string: str, length: int, concat=False) str

Simple function to format a string to a correct length.

pipeline.utils.get_attribute(env: str, args, var: str) str

Assemble environment variable or take from passed argument. Find value of variable from Environment or ParseArgs object, or reports failure.

Parameters:
  • env – (str) Name of environment variable.

  • args – (obj) Set of command line arguments supplied by argparse.

  • var – (str) Name of argparse parameter to check.

Returns:

Value of either environment variable or argparse value.

pipeline.utils.get_blacklist(group: str, workdir: str) list

Returns a list of the project codes given a filename (repeat id)

Parameters:
  • group – (str) Name of current group or path to group directory (groupdir) in which case workdir can be left as None.

  • workdir – (str) Path to working directory or None. If this is None, group value will be assumed as the groupdir path.

Returns:

A list of codes if the file is found, an empty list otherwise.

pipeline.utils.get_codes(group: str, workdir: str, filename: str, extension='.txt') list

Returns a list of the project codes given a filename (repeat id)

Parameters:
  • group – (str) Name of current group or path to group directory (groupdir) in which case workdir can be left as None.

  • workdir – (str) Path to working directory or None. If this is None, group value will be assumed as the groupdir path.

  • filename – (str) Name of text file to access within group (or path within the groupdir to the text file

  • extension – (str) For the specific case of non-text-files.

Returns:

A list of codes if the file is found, an empty list otherwise.

pipeline.utils.get_proj_dir(proj_code: str, workdir: str, groupID: str) str

Simple function to assemble the project directory, depends on groupID May be redundant in the future if a ‘serial’ directory is added.

pipeline.utils.get_proj_file(proj_dir: str, proj_file: str) dict

Returns the contents of a project file within a project code directory.

Parameters:
  • proj_dir – (str) The project code directory path.

  • proj_file – (str) Name of a file to access within the project directory.

Returns:

A dictionary of the contents of a json file or None if there are problems.

pipeline.utils.mem_to_val(value: str) float

Convert a value in Bytes to an integer number of bytes

pipeline.utils.open_kerchunk(kfile: str, logger, isparq=False, retry=False, attempt=1, **kwargs) Dataset

Open kerchunk file from JSON/parquet formats

Parameters:
  • kfile – (str) Path to a kerchunk file (or https link if using a remote file)

  • logger – (obj) Logging object for info/debug/error messages.

  • isparq – (bool) Switch for using Parquet or JSON Format

  • remote_protocol – (str) ‘file’ for local filepaths, ‘http’ for remote links.

Returns:

An xarray virtual dataset constructed from the Kerchunk file

pipeline.utils.set_codes(group: str, workdir: str, filename: str, contents, extension='.txt', overwrite=0) None

Returns a list of the project codes given a filename (repeat id)

Parameters:
  • group – (str) Name of current group or path to group directory (groupdir) in which case workdir can be left as None.

  • workdir – (str) Path to working directory or None. If this is None, group value will be assumed as the groupdir path.

  • filename – (str) Name of text file to access within group (or path within the groupdir to the text file

  • contents – (str) Combined contents to write to the file.

  • extension – (str) For the specific case of non-text-files.

  • overwrite – (str) Specifier for open() built-in python method, completely overwrite the file contents or append to existing values.

Returns:

None

pipeline.utils.set_proj_file(proj_dir: str, proj_file: str, contents: dict, logger: Logger) None

Overwrite the contents of a project file within a project code directory.

Parameters:
  • proj_dir – (str) The project code directory path.

  • proj_file – (str) Name of a file to access within the project directory.

  • contents – (dict) Dictionary to write into json format config file within the project directory.

Returns:

A dictionary of the contents of a json file or None if there are problems.

Logging

pipeline.logs.init_logger(verbose, mode, name, fh=None, logid=None)

Logger object init and configure with formatting

pipeline.logs.log_status(phase, proj_dir, status, logger, jobid='', dryrun='')

Find the status file for this project code, add a new entry for the status. - Entry should be of the form: - phase, status, time, jobid