Padocc Filehandlers

Filehandlers are an integral component of PADOCC on the filesystem. The filehandlers connect directly to files within the pipeline directories for different groups and projects and provide a seamless environment for fetching and saving values to these files.

Filehandlers act like their respective data-types in most or all methods. For example the JSONFileHandler acts like a dictionary, but with extra methods to close and save the loaded data. Filehandlers can also be easily migrated or removed from the filesystem as part of other processes.

class padocc.core.filehandlers.CSVFileHandler(dir: str, filename: str, **kwargs)

Bases: ListFileHandler

CSV File handler for padocc config files

update_status(phase: str, status: str, jobid: str = '') None

Update formatted status for this log with the phase and status

Parameters:
  • phase – (str) The phase for which this project is being operated.

  • status – (str) The status of the current run (e.g. Success, Failed, Fatal)

  • jobid – (str) The jobID of this run if present.

class padocc.core.filehandlers.FileIOMixin(dir: str, filename: str, logger: Logger | FalseLogger | None = None, label: str | None = None, fh: str | None = None, logid: str | None = None, dryrun: bool = False, forceful: bool = False, verbose: int = 0)

Bases: LoggedOperation

Class for containing Filehandler behaviour which is exactly identical for all Filehandler subclasses.

Identical behaviour

  1. Contains:

    ‘item’ in fh

  2. Create/save file:

Filehandlers intrinsically know the file they are attached to so there are no attributes passed to either of these.

fh.create_file() fh.close()

  1. Get/set:

    contents = fh.get() fh.set(contents)

create_file() None

Create the file if not on dryrun.

property file: str

Returns the full filename attribute.

file_exists() bool

Return true if the file is found.

property filepath: str

Returns the full filepath attribute.

remove_file() None

Remove the file on the filesystem if not on dryrun

class padocc.core.filehandlers.GenericStore(parent_dir: str, store_name: str, metadata_name: str = '.zattrs', extension: str = 'zarr', logger: Logger | FalseLogger | None = None, label: str | None = None, fh: str | None = None, logid: str | None = None, dryrun: bool = False, forceful: bool = False, verbose: int = 0)

Bases: LoggedOperation

Filehandler for Generic stores in Padocc - enables Filesystem operations on component files.

clear() None

Remove all components of the store

open(engine: str = 'zarr', **open_kwargs) Dataset

Open the store as a dataset (READ_ONLY)

property store_path: str

Assemble the store path

class padocc.core.filehandlers.JSONFileHandler(dir: str, filename: str, conf: dict | None = None, init_value: dict | None = None, **kwargs)

Bases: FileIOMixin

JSON File handler for padocc config files.

close() None

Save the content of the filehandler

create_file() None

JSON files require entry of a single dict on creation

get(index: str | None = None, default: str | None = None) str | dict | None

Safe method to get a value from this filehandler

set(value: dict) None

Set the value of the whole dictionary.

class padocc.core.filehandlers.KerchunkFile(dir: str, filename: str, conf: dict | None = None, init_value: dict | None = None, **kwargs)

Bases: JSONFileHandler

Filehandler for Kerchunk file, enables substitution/replacement for local/remote links, and updating content.

Add the download link to this Kerchunk File

add_kerchunk_history(version_no: str) None

Add kerchunk variables to the metadata for this dataset, including creation/update date and version/revision number.

class padocc.core.filehandlers.KerchunkStore(parent_dir: str, store_name: str, **kwargs)

Bases: GenericStore

Filehandler for Kerchunk stores using parquet in PADOCC. Enables setting metadata attributes and will allow combining stores in future.

open(*args, **parquet_kwargs) Dataset

Open the Parquet Store as an xarray dataset

class padocc.core.filehandlers.ListFileHandler(dir: str, filename: str, extension: str | None = None, init_value: list | None = None, **kwargs)

Bases: FileIOMixin

Filehandler for string-based Lists in Padocc

append(newvalue: str) None

Add a new value to the internal list

close() None

Save the content of the filehandler

get() list

Get the current value

set(value: list) None

Reset the value as a whole for this filehandler.

class padocc.core.filehandlers.LogFileHandler(dir: str, filename: str, extra_path: str = '', **kwargs)

Bases: ListFileHandler

Log File handler for padocc phase logs.

property file: str

Returns the full filename attribute.

class padocc.core.filehandlers.ZarrStore(parent_dir: str, store_name: str, **kwargs)

Bases: GenericStore

Filehandler for Zarr stores in PADOCC. Enables manipulation of Zarr store on filesystem and setting metadata attributes.

open(*args, **zarr_kwargs) Dataset

Open the ZarrStore as an xarray dataset

Utilities

class padocc.core.utils.BypassSwitch(switch='D')

Bases: object

Class to represent all bypass switches throughout the pipeline. Requires a switch string which is used to enable/disable specific pipeline switches stored in this class.

padocc.core.utils.find_zarrays(refs: dict) dict

Quick way of extracting all the zarray components of a ref set.

padocc.core.utils.format_str(string: str, length: int, concat: bool = False, shorten: bool = False) str

Simple function to format a string to a correct length.

padocc.core.utils.get_attribute(env: str, value: str) str

Assemble environment variable or take from passed argument. Find value of variable from Environment or ParseArgs object, or reports failure.

Parameters:
  • env – (str) Name of environment variable.

  • args – (obj) Set of command line arguments supplied by argparse.

  • var – (str) Name of argparse parameter to check.

Returns:

Value of either environment variable or argparse value.

padocc.core.utils.mem_to_val(value: str) float

Convert a value in Bytes to an integer number of bytes

padocc.core.utils.open_kerchunk(kfile: str, logger, isparq=False, retry=False, attempt=1, **kwargs) Dataset

Open kerchunk file from JSON/parquet formats

Parameters:
  • kfile – (str) Path to a kerchunk file (or https link if using a remote file)

  • logger – (obj) Logging object for info/debug/error messages.

  • isparq – (bool) Switch for using Parquet or JSON Format

  • remote_protocol – (str) ‘file’ for local filepaths, ‘http’ for remote links.

Returns:

An xarray virtual dataset constructed from the Kerchunk file

Logging

class padocc.core.logs.FalseLogger

Bases: object

Supplementary class where a logger is not wanted but is required for some operations.

class padocc.core.logs.LoggedOperation(logger: Logger | FalseLogger | None = None, label: str | None = None, fh: str | None = None, logid: str | None = None, verbose: int = 0)

Bases: object

Allows inherritance of logger objects without creating new ones.

padocc.core.logs.init_logger(verbose: int, name: str, fh: str = None, logid: str = None) Logger

Logger object init and configure with standardised formatting.

Parameters:
  • verbose – (int) Level of verbosity for log messages (see core.init_logger).

  • name – (str) The label to apply to the logger object.

  • fh – (str) Path to logfile for logger object generated in this specific process.

  • logid – (str) ID of the process within a subset, which is then added to the name of the logger - prevents multiple processes with different logfiles getting loggers confused.

Returns:

A new logger object.

padocc.core.logs.reset_file_handler(logger: Logger, verbose: int, fh: str) Logger

Reset the file handler for an existing logger object.

Parameters:
  • logger – (logging.Logger) An existing logger object.

  • verbose – (int) The logging level to reapply.

  • fh – (str) Address to new file handler.

Returns:

A new logger object with a new file handler.