Padocc Utility Scripts
Utilities
- class pipeline.utils.BypassSwitch(switch='DBSCLR')
Bases:
object
Class to represent all bypass switches throughout the pipeline. Requires a switch string which is used to enable/disable specific pipeline switches stored in this class.
- pipeline.utils.find_zarrays(refs: dict) dict
Quick way of extracting all the zarray components of a ref set.
- pipeline.utils.format_str(string: str, length: int, concat=False) str
Simple function to format a string to a correct length.
- pipeline.utils.get_attribute(env: str, args, var: str) str
Assemble environment variable or take from passed argument. Find value of variable from Environment or ParseArgs object, or reports failure.
- Parameters:
env – (str) Name of environment variable.
args – (obj) Set of command line arguments supplied by argparse.
var – (str) Name of argparse parameter to check.
- Returns:
Value of either environment variable or argparse value.
- pipeline.utils.get_blacklist(group: str, workdir: str) list
Returns a list of the project codes given a filename (repeat id)
- Parameters:
group – (str) Name of current group or path to group directory (groupdir) in which case workdir can be left as None.
workdir – (str) Path to working directory or None. If this is None, group value will be assumed as the groupdir path.
- Returns:
A list of codes if the file is found, an empty list otherwise.
- pipeline.utils.get_codes(group: str, workdir: str, filename: str, extension='.txt') list
Returns a list of the project codes given a filename (repeat id)
- Parameters:
group – (str) Name of current group or path to group directory (groupdir) in which case workdir can be left as None.
workdir – (str) Path to working directory or None. If this is None, group value will be assumed as the groupdir path.
filename – (str) Name of text file to access within group (or path within the groupdir to the text file
extension – (str) For the specific case of non-text-files.
- Returns:
A list of codes if the file is found, an empty list otherwise.
- pipeline.utils.get_proj_dir(proj_code: str, workdir: str, groupID: str) str
Simple function to assemble the project directory, depends on groupID May be redundant in the future if a ‘serial’ directory is added.
- pipeline.utils.get_proj_file(proj_dir: str, proj_file: str) dict
Returns the contents of a project file within a project code directory.
- Parameters:
proj_dir – (str) The project code directory path.
proj_file – (str) Name of a file to access within the project directory.
- Returns:
A dictionary of the contents of a json file or None if there are problems.
- pipeline.utils.mem_to_val(value: str) float
Convert a value in Bytes to an integer number of bytes
- pipeline.utils.open_kerchunk(kfile: str, logger, isparq=False, retry=False, attempt=1, **kwargs) Dataset
Open kerchunk file from JSON/parquet formats
- Parameters:
kfile – (str) Path to a kerchunk file (or https link if using a remote file)
logger – (obj) Logging object for info/debug/error messages.
isparq – (bool) Switch for using Parquet or JSON Format
remote_protocol – (str) ‘file’ for local filepaths, ‘http’ for remote links.
- Returns:
An xarray virtual dataset constructed from the Kerchunk file
- pipeline.utils.set_codes(group: str, workdir: str, filename: str, contents, extension='.txt', overwrite=0) None
Returns a list of the project codes given a filename (repeat id)
- Parameters:
group – (str) Name of current group or path to group directory (groupdir) in which case workdir can be left as None.
workdir – (str) Path to working directory or None. If this is None, group value will be assumed as the groupdir path.
filename – (str) Name of text file to access within group (or path within the groupdir to the text file
contents – (str) Combined contents to write to the file.
extension – (str) For the specific case of non-text-files.
overwrite – (str) Specifier for open() built-in python method, completely overwrite the file contents or append to existing values.
- Returns:
None
- pipeline.utils.set_proj_file(proj_dir: str, proj_file: str, contents: dict, logger: Logger) None
Overwrite the contents of a project file within a project code directory.
- Parameters:
proj_dir – (str) The project code directory path.
proj_file – (str) Name of a file to access within the project directory.
contents – (dict) Dictionary to write into json format config file within the project directory.
- Returns:
A dictionary of the contents of a json file or None if there are problems.
Logging
- pipeline.logs.init_logger(verbose, mode, name, fh=None, logid=None)
Logger object init and configure with formatting
- pipeline.logs.log_status(phase, proj_dir, status, logger, jobid='', dryrun='')
Find the status file for this project code, add a new entry for the status. - Entry should be of the form: - phase, status, time, jobid