Padocc Filehandlers
Filehandlers are an integral component of PADOCC on the filesystem. The filehandlers connect directly to files within the pipeline directories for different groups and projects and provide a seamless environment for fetching and saving values to these files.
Filehandlers act like their respective data-types in most or all methods.
For example the JSONFileHandler
acts like a dictionary, but with extra methods to close and save
the loaded data. Filehandlers can also be easily migrated or removed from the filesystem as part of other
processes.
- class padocc.core.filehandlers.CFADataset(filepath: str, identifier: str, **kwargs)
Bases:
LoggedOperation
Basic handler for CFA dataset
Added behaviours
Open dataset - opens the CFA dataset
- close() None
Set the meta attribute for this dataset.
- get_meta() dict
Get the metadata/attributes for this dataset.
- open_dataset(**kwargs) Dataset
Open the CFA Dataset [READ-ONLY]
- set_meta(new_value: dict) None
Set the whole meta attribute for this dataset.
- Parameters:
new_value – (dict) New metadata contents.
- spawn_copy(copy: str)
Spawn a copy of this file (not filehandler)
- Parameters:
copy – (str) For the CFA filehandler, copy should be the full path to the new location, minus the extension. This should include the version number at the point of release.
- update_history(addition: str, new_version: str) None
Update the history with a new addition.
Sets the new version/revision automatically.
- Parameters:
addition – (str) Message to add to dataset history.
new_version – (str) New version the message applies to.
- class padocc.core.filehandlers.CSVFileHandler(dir: str, filename: str, **kwargs)
Bases:
ListFileHandler
CSV File handler for padocc config files
- update_status(phase: str, status: str, jobid: str = '') None
Update formatted status for this log with the phase and status
- Parameters:
phase – (str) The phase for which this project is being operated.
status – (str) The status of the current run (e.g. Success, Failed, Fatal)
jobid – (str) The jobID of this run if present.
- class padocc.core.filehandlers.FileIOMixin(dir: str, filename: str, logger: Logger | FalseLogger | None = None, label: str | None = None, fh: str | None = None, logid: str | None = None, dryrun: bool = False, forceful: bool = False, thorough: bool = False, verbose: int = 0)
Bases:
LoggedOperation
Class for containing Filehandler behaviour which is exactly identical for all Filehandler subclasses.
Identical behaviour
- Contains:
‘item’ in fh
Create/save file:
Filehandlers intrinsically know the file they are attached to so there are no attributes passed to either of these.
fh.create_file() fh.close()
Get/set:
contents = fh.get() fh.set(contents)
- create_file() None
Create the file if not on dryrun.
- property file: str
Returns the full filename attribute.
- file_exists() bool
Return true if the file is found.
- property filepath: str
Returns the full filepath attribute.
- move_file(new_dir: str, new_name: str | None = None, new_extension: str | None = None) None
Migrate the file to a new location.
- Parameters:
new_dir – (str) New directory for filehandler being moved.
new_name – (str) New name for filehandler if required.
new_extension – (str) New extension if required (e.g. changing log-type).
- remove_file() None
Remove the file on the filesystem if not on dryrun
- class padocc.core.filehandlers.GenericStore(parent_dir: str, store_name: str, metadata_name: str = '.zattrs', extension: str = 'zarr', logger: Logger | FalseLogger | None = None, label: str | None = None, fh: str | None = None, logid: str | None = None, dryrun: bool = False, forceful: bool = False, thorough: bool = False, verbose: int = 0)
Bases:
LoggedOperation
Filehandler for Generic stores in Padocc - enables Filesystem operations on component files.
Behaviours (Applies to Metadata)
Length - length of metadata keyset
Contains - metadata contains key (as with dict)
Indexable - Get/set a specific property.
Get/set_meta - Get/set the whole metadata set.
Clear - clears all files in the store.
- clear() None
Remove all components of the store
- close() None
Close the meta filehandler for this store
- get_meta()
Obtain the metadata dictionary
- property is_empty: bool
Check if the store contains any data
- set_meta(values: dict)
Reset the metadata dictionary
- Parameters:
values – (dict) Complete set of metadata for this store.
- spawn_copy(copy: str)
Spawn a copy of this store (not filehandler)
- Parameters:
copy – (str) New full path + name for external copy of the store (minus extension).
- property store_path: str
Assemble the store path
- update_history(addition: str, new_version: str) None
Update the history with a new addition.
Sets the new version/revision automatically.
- Parameters:
addition – (str) Message to add to dataset history.
new_version – (str) New version the message applies to.
- class padocc.core.filehandlers.JSONFileHandler(dir: str, filename: str, conf: dict | None = None, init_value: dict | None = None, **kwargs)
Bases:
FileIOMixin
JSON File handler for padocc config files.
Dictionary Behaviour
Indexable - index by key (as normal)
Contains - key in dict (as normal)
Length - length of the key set (as normal)
Added Behaviour
Iterable - iterate over the keys.
Get/set - get/set the whole value.
Create_file - Specific for JSON files.
- close() None
Save the content of the filehandler
- create_file() None
JSON files require entry of a single dict on creation.
- get(index: str | None = None, default: str | None = None) str | dict | None
Safe method to get a value from this filehandler.
- Parameters:
index – (str) Key in dictionary.
default – (str) Default value for this item in the dictionary.
- pop(index: str, default: Any = None) Any
Wrapper for
pop
function of a dict.
- set(value: dict) None
Set the value of the whole dictionary.
- Parameters:
value – (dict) New value to set for this filehandler.
- class padocc.core.filehandlers.KerchunkFile(dir: str, filename: str, conf: dict | None = None, init_value: dict | None = None, **kwargs)
Bases:
JSONFileHandler
Filehandler for Kerchunk file, enables substitution/replacement for local/remote links, and updating content.
- add_download_link(sub: str = '/', replace: str = 'https://dap.ceda.ac.uk') None
Add the download link to this Kerchunk File.
- Parameters:
sub – (str) Substitution value to be replaced.
replace – (str) Replacement value in download links.
- get_meta() dict | None
Obtain the metadata dictionary
- open_dataset(fsspec_kwargs: dict | None = None, retry: bool = False, **kwargs) Dataset
Open the kerchunk file as a dataset
- Parameters:
fsspec_kwargs – (dict) Kwargs applied to fsspec mapper.
retry – (bool) Unused property for multiple tries when searching for kerchunk dataset.
- set_meta(values: dict)
Reset the metadata dictionary.
- Parameters:
values – (dict) Fully replace all zattrs in kerchunk dataset.
- spawn_copy(copy: str)
Spawn a copy of this file (not filehandler)
- Parameters:
copy – (str) Path to new copy location and filename (minus extension).
- update_history(addition: str, new_version: str) None
Update the history with a new addition.
Sets the new version/revision automatically.
- Parameters:
addition – (str) Message to add to dataset history.
new_version – (str) Specific version number for the history entry being applied.
- class padocc.core.filehandlers.KerchunkStore(parent_dir: str, store_name: str, **kwargs)
Bases:
GenericStore
Filehandler for Kerchunk stores using parquet in PADOCC. Enables setting metadata attributes and will allow combining stores in future.
Added behaviours
Open dataset - opens the kerchunk store.
- open_dataset(rfs_kwargs: dict | None = None, **parquet_kwargs) Dataset
Open the Parquet Store as an xarray dataset
- class padocc.core.filehandlers.ListFileHandler(dir: str, filename: str, extension: str | None = None, init_value: list | None = None, **kwargs)
Bases:
FileIOMixin
Filehandler for string-based Lists in Padocc.
List Behaviour
Append - works the same as with normal lists.
Pop - remove a specific value (works as normal).
Contains - (x in y) works as normal.
Length - (len(x)) works as normal.
Iterable - (for x in y) works as normal.
Indexable - (x[0]) works as normal
Added behaviour
Close - close and save the file.
Get/Set - Get or set the whole value.
- append(newvalue: str | list) None
Add a new value to the internal list.
- Parameters:
newvalue – (str|list) New value to append to current list.
- close() None
Save the content of the filehandler
- get() list
Get the current value
- remove(oldvalue: str) None
Remove a value from the internal list
- Parameters:
oldvalue – (str) Remove past value from list.
- set(value: list[str, list]) None
Reset the value as a whole for this filehandler.
- Parameters:
value – (list) Reset the
_value
property for this filehandler to the new value.
- class padocc.core.filehandlers.LogFileHandler(dir: str, filename: str, extra_path: str = '', **kwargs)
Bases:
ListFileHandler
Log File handler for padocc phase logs.
- property filepath: str
Returns the full filepath attribute.
- class padocc.core.filehandlers.ZarrStore(parent_dir: str, store_name: str, remote_s3: dict | None = None, **kwargs)
Bases:
GenericStore
Filehandler for Zarr stores in PADOCC. Enables manipulation of Zarr store on filesystem and setting metadata attributes.
Added Behaviours
Open dataset - open the zarr store.
Write to s3 - write a disk-based zarr store to s3.
- get_meta() dict
Override super function in case of remote s3.
- open_dataset(**zarr_kwargs) Dataset
Open the ZarrStore as an xarray dataset
- property store: str | object
Returns the store path or s3 store object as required.
- write_to_s3(credentials: dict | str, bucket_id: str, name_overwrite: str | None = None, s3_kwargs: dict = None, ds: Dataset | None = None, **zarr_kwargs)
Write zarr store to an S3 Object Store bucket directly from padocc
Utilities
- class padocc.core.utils.BypassSwitch(switch: str = 'D')
Bases:
object
Switch container class for multiple error switches.
Class to represent all bypass switches throughout the pipeline. Requires a switch string which is used to enable/disable specific pipeline switches stored in this class.
- padocc.core.utils.apply_substitutions(subkey: str, subs: dict | None = None, content: list | None = None)
Apply substitutions to all elements in the provided content list.
- Parameters:
subkey – (str) The key to extract from the provided set of substitutions. This is in the case were substitutions are specified for different levels of input files.
subs – (dict) The substitutions applied to the content.
content – (list) The set of filepaths to apply substitutions.
- padocc.core.utils.deformat_float(item: str) str
Format byte-value with proper units.
- Parameters:
item – (str) Byte value to format into a float.
- padocc.core.utils.extract_file(input_file: str) list
Extract content from a padocc-external file.
Use filehandlers for files within the pipeline.
- Parameters:
input_file – (str) Pipeline-external file.
- padocc.core.utils.extract_json(input_file: str) list
Extract content from a padocc-external file.
Use filehandlers for files within the pipeline.
- Parameters:
input_file – (str) Pipeline-external file.
- padocc.core.utils.find_closest(num: int, closest: float) int
Find divisions for a dimension for rechunking purposes.
Used in Zarr rechunking and conversion.
- Parameters:
num – (int) Typically the size of the dimension
closest – (float) Find a divisor closest to this value.
- padocc.core.utils.format_float(value: float) str
Format byte-value with proper units.
- Parameters:
value – (float) Number of bytes (avg), format to a string representation.
- padocc.core.utils.format_str(string: Any, length: int, concat: bool = False, shorten: bool = False) str
Simple function to format a string to a correct length.
- Parameters:
string – (str) Message to format into a string of exact length.
length – (int) Accepted length of string.
concat – (bool) If True, will add elipses for overrunning strings.
shorten – (bool) If True will allow shorter messages, otherwise will fill with whitespace.
- padocc.core.utils.format_tuple(tup: tuple[list[int]]) str
Transform tuple to string representation
- Parameters:
tup – (tuple) Tuple object to be rendered to string.
- padocc.core.utils.get_attribute(env: str, args, value: str) str
Assemble environment variable or take from passed argument. Find value of variable from Environment or ParseArgs object, or reports failure.
- Parameters:
env – (str) Name of environment variable.
args – (obj) Set of command line arguments supplied by argparse.
var – (str) Name of argparse parameter to check.
- Returns:
Value of either environment variable or argparse value.
- padocc.core.utils.list_groups(workdir: str, func: ~typing.Callable = <built-in function print>)
List groups in the existing working directory
- padocc.core.utils.make_tuple(item: Any) tuple
Make any object into a tuple.
- Parameters:
item – (Any) Insert item into a tuple if not already one.
- padocc.core.utils.mem_to_val(value: str) float
Convert a value in Bytes to an integer number of bytes.
- Parameters:
value – (str) Convert number of bytes (XB) to float.
- padocc.core.utils.print_fmt_str(string: str, help_length: int = 40, concat: bool = True, shorten: bool = False)
Replacement for callable function in
help
methods.This print-replacement adds whitespace between functions and their help descriptions.
- Parameters:
string – (str) Message to format into a string of exact length.
help_length – (int) Accepted length of string.
concat – (bool) If True, will add elipses for overrunning strings.
shorten – (bool) If True will allow shorter messages, otherwise will fill with whitespace.
Logging
- class padocc.core.logs.FalseLogger
Bases:
object
Supplementary class where a logger is not wanted but is required for some operations.
- class padocc.core.logs.LoggedOperation(logger: Logger | FalseLogger | None = None, label: str | None = None, fh: str | None = None, logid: str | None = None, forceful: bool = None, dryrun: bool = None, thorough: bool = None, verbose: int = 0)
Bases:
object
Allows inherritance of logger objects without creating new ones.
- classmethod help(func: ~typing.Callable = <built-in function print>)
No public methods.
- padocc.core.logs.init_logger(verbose: int, name: str, fh: str = None, logid: str = None) Logger
Logger object init and configure with standardised formatting.
- Parameters:
verbose – (int) Level of verbosity for log messages (see core.init_logger).
name – (str) The label to apply to the logger object.
fh – (str) Path to logfile for logger object generated in this specific process.
logid – (str) ID of the process within a subset, which is then added to the name of the logger - prevents multiple processes with different logfiles getting loggers confused.
- Returns:
A new logger object.
- padocc.core.logs.reset_file_handler(logger: Logger, verbose: int, fh: str) Logger
Reset the file handler for an existing logger object.
- Parameters:
logger – (logging.Logger) An existing logger object.
verbose – (int) The logging level to reapply.
fh – (str) Address to new file handler.
- Returns:
A new logger object with a new file handler.