Padocc Filehandlers
Filehandlers are an integral component of PADOCC on the filesystem. The filehandlers connect directly to files within the pipeline directories for different groups and projects and provide a seamless environment for fetching and saving values to these files.
Filehandlers act like their respective data-types in most or all methods.
For example the JSONFileHandler
acts like a dictionary, but with extra methods to close and save
the loaded data. Filehandlers can also be easily migrated or removed from the filesystem as part of other
processes.
- class padocc.core.filehandlers.CSVFileHandler(dir: str, filename: str, **kwargs)
Bases:
ListFileHandler
CSV File handler for padocc config files
- update_status(phase: str, status: str, jobid: str = '') None
Update formatted status for this log with the phase and status
- Parameters:
phase – (str) The phase for which this project is being operated.
status – (str) The status of the current run (e.g. Success, Failed, Fatal)
jobid – (str) The jobID of this run if present.
- class padocc.core.filehandlers.FileIOMixin(dir: str, filename: str, logger: Logger | FalseLogger | None = None, label: str | None = None, fh: str | None = None, logid: str | None = None, dryrun: bool = False, forceful: bool = False, verbose: int = 0)
Bases:
LoggedOperation
Class for containing Filehandler behaviour which is exactly identical for all Filehandler subclasses.
Identical behaviour
- Contains:
‘item’ in fh
Create/save file:
Filehandlers intrinsically know the file they are attached to so there are no attributes passed to either of these.
fh.create_file() fh.close()
Get/set:
contents = fh.get() fh.set(contents)
- create_file() None
Create the file if not on dryrun.
- property file: str
Returns the full filename attribute.
- file_exists() bool
Return true if the file is found.
- property filepath: str
Returns the full filepath attribute.
- remove_file() None
Remove the file on the filesystem if not on dryrun
- class padocc.core.filehandlers.GenericStore(parent_dir: str, store_name: str, metadata_name: str = '.zattrs', extension: str = 'zarr', logger: Logger | FalseLogger | None = None, label: str | None = None, fh: str | None = None, logid: str | None = None, dryrun: bool = False, forceful: bool = False, verbose: int = 0)
Bases:
LoggedOperation
Filehandler for Generic stores in Padocc - enables Filesystem operations on component files.
- clear() None
Remove all components of the store
- open(engine: str = 'zarr', **open_kwargs) Dataset
Open the store as a dataset (READ_ONLY)
- property store_path: str
Assemble the store path
- class padocc.core.filehandlers.JSONFileHandler(dir: str, filename: str, conf: dict | None = None, init_value: dict | None = None, **kwargs)
Bases:
FileIOMixin
JSON File handler for padocc config files.
- close() None
Save the content of the filehandler
- create_file() None
JSON files require entry of a single dict on creation
- get(index: str | None = None, default: str | None = None) str | dict | None
Safe method to get a value from this filehandler
- set(value: dict) None
Set the value of the whole dictionary.
- class padocc.core.filehandlers.KerchunkFile(dir: str, filename: str, conf: dict | None = None, init_value: dict | None = None, **kwargs)
Bases:
JSONFileHandler
Filehandler for Kerchunk file, enables substitution/replacement for local/remote links, and updating content.
- add_download_link(sub: str = '/', replace: str = 'https://dap.ceda.ac.uk') None
Add the download link to this Kerchunk File
- add_kerchunk_history(version_no: str) None
Add kerchunk variables to the metadata for this dataset, including creation/update date and version/revision number.
- class padocc.core.filehandlers.KerchunkStore(parent_dir: str, store_name: str, **kwargs)
Bases:
GenericStore
Filehandler for Kerchunk stores using parquet in PADOCC. Enables setting metadata attributes and will allow combining stores in future.
- open(*args, **parquet_kwargs) Dataset
Open the Parquet Store as an xarray dataset
- class padocc.core.filehandlers.ListFileHandler(dir: str, filename: str, extension: str | None = None, init_value: list | None = None, **kwargs)
Bases:
FileIOMixin
Filehandler for string-based Lists in Padocc
- append(newvalue: str) None
Add a new value to the internal list
- close() None
Save the content of the filehandler
- get() list
Get the current value
- set(value: list) None
Reset the value as a whole for this filehandler.
- class padocc.core.filehandlers.LogFileHandler(dir: str, filename: str, extra_path: str = '', **kwargs)
Bases:
ListFileHandler
Log File handler for padocc phase logs.
- property file: str
Returns the full filename attribute.
- class padocc.core.filehandlers.ZarrStore(parent_dir: str, store_name: str, **kwargs)
Bases:
GenericStore
Filehandler for Zarr stores in PADOCC. Enables manipulation of Zarr store on filesystem and setting metadata attributes.
- open(*args, **zarr_kwargs) Dataset
Open the ZarrStore as an xarray dataset
Utilities
- class padocc.core.utils.BypassSwitch(switch='D')
Bases:
object
Class to represent all bypass switches throughout the pipeline. Requires a switch string which is used to enable/disable specific pipeline switches stored in this class.
- padocc.core.utils.find_zarrays(refs: dict) dict
Quick way of extracting all the zarray components of a ref set.
- padocc.core.utils.format_str(string: str, length: int, concat: bool = False, shorten: bool = False) str
Simple function to format a string to a correct length.
- padocc.core.utils.get_attribute(env: str, value: str) str
Assemble environment variable or take from passed argument. Find value of variable from Environment or ParseArgs object, or reports failure.
- Parameters:
env – (str) Name of environment variable.
args – (obj) Set of command line arguments supplied by argparse.
var – (str) Name of argparse parameter to check.
- Returns:
Value of either environment variable or argparse value.
- padocc.core.utils.mem_to_val(value: str) float
Convert a value in Bytes to an integer number of bytes
- padocc.core.utils.open_kerchunk(kfile: str, logger, isparq=False, retry=False, attempt=1, **kwargs) Dataset
Open kerchunk file from JSON/parquet formats
- Parameters:
kfile – (str) Path to a kerchunk file (or https link if using a remote file)
logger – (obj) Logging object for info/debug/error messages.
isparq – (bool) Switch for using Parquet or JSON Format
remote_protocol – (str) ‘file’ for local filepaths, ‘http’ for remote links.
- Returns:
An xarray virtual dataset constructed from the Kerchunk file
Logging
- class padocc.core.logs.FalseLogger
Bases:
object
Supplementary class where a logger is not wanted but is required for some operations.
- class padocc.core.logs.LoggedOperation(logger: Logger | FalseLogger | None = None, label: str | None = None, fh: str | None = None, logid: str | None = None, verbose: int = 0)
Bases:
object
Allows inherritance of logger objects without creating new ones.
- padocc.core.logs.init_logger(verbose: int, name: str, fh: str = None, logid: str = None) Logger
Logger object init and configure with standardised formatting.
- Parameters:
verbose – (int) Level of verbosity for log messages (see core.init_logger).
name – (str) The label to apply to the logger object.
fh – (str) Path to logfile for logger object generated in this specific process.
logid – (str) ID of the process within a subset, which is then added to the name of the logger - prevents multiple processes with different logfiles getting loggers confused.
- Returns:
A new logger object.
- padocc.core.logs.reset_file_handler(logger: Logger, verbose: int, fh: str) Logger
Reset the file handler for an existing logger object.
- Parameters:
logger – (logging.Logger) An existing logger object.
verbose – (int) The logging level to reapply.
fh – (str) Address to new file handler.
- Returns:
A new logger object with a new file handler.