ProjectOperation Core and Mixin Behaviours
Source code for individual project operations and mixin behaviours.
- class padocc.core.project.ProjectOperation(proj_code: str, workdir: str, groupID: str = None, first_time: bool = None, ft_kwargs: dict = None, logger: ~logging.Logger = None, bypass: ~padocc.core.utils.BypassSwitch = <padocc.core.utils.BypassSwitch object>, label: str = None, fh: str = None, logid: str = None, verbose: int = 0, forceful: bool = None, dryrun: bool = None, thorough: bool = None, mem_allowed: str | None = None, remote_s3: dict | str | None = None)
PADOCC Project Operation class.
Able to access project files and perform some simple functions. Single-project operations always inherit from this class (e.g. Scan, Compute, Validate)
- complete_project(move_to: str) None
Move project to a completeness directory
- Parameters:
move_to – (str) Path to completeness directory to extract content.
- delete_project(ask: bool = True)
Delete a project
- Parameters:
ask – (bool) Will ask an ‘are you sure’ message if not False.
- property dir
Project directory property, relative to workdir.
- file_exists(file: str)
Check if a named file exists (without extension).
This can be any generic filehandler attached.
- classmethod help(func: ~typing.Callable = <function print_fmt_str>)
Public user functions for the project operator.
- Parameters:
func – (Callable) provide an alternative to ‘print’ function for displaying help information.
- info()
Display some info about this particular project.
- migrate(newgroupID: str)
Migrate this project to a new group.
Moves the whole project directory on the filesystem and moves all associated filehandlers (individually).
- Parameters:
newgroupID – (str) ID of new group to move this project to.
- run(mode: str = 'kerchunk', bypass: BypassSwitch | None = None, forceful: bool = None, thorough: bool = None, dryrun: bool = None, **kwargs) str
Main function for running any project operation.
All subclasses act as plugins for this function, and require a
_run
method called from here. This means all error handling with status logs and log files can be dealt with here.To find the parameters for a specific operation (e.g. compute with kerchunk mode), see the additional parameters of
run
in the source code for the phase you are running. In this example, seepadocc.phases.compute:KerchunkDS._run
- Parameters:
mode – (str) Cloud format to use for any operations. Default value is ‘kerchunk’ and any changes via the ‘cloud_format’ parameter to this project are taken into account. Note: Setting the mode for a specific operation using THIS argument, will reset the cloud format stored property for this class.
bypass – (BypassSwitch) instance of BypassSwitch class containing multiple bypass/skip options for specific events. See utils.BypassSwitch.
forceful – (bool) Continue with processing even if final output file already exists.
dryrun – (bool) If True will prevent output files being generated or updated and instead will demonstrate commands that would otherwise happen.
thorough – (bool) From args.quality - if True will create all files from scratch, otherwise saved refs from previous runs will be loaded.
- save_files() None
Save all filehandlers associated with this group.
- class padocc.core.mixins.dataset.DatasetHandlerMixin
Mixin class for properties relating to opening products.
This is a behavioural Mixin class and thus should not be directly accessed. Where possible, encapsulated classes should contain all relevant parameters for their operation as per convention, however this is not the case for mixin classes. The mixin classes here will explicitly state where they are designed to be used, as an extension of an existing class.
Use case: ProjectOperation [ONLY]
- add_s3_config(remote_s3: dict | str | None = None) None
Add remote_s3 configuration for this project
- Parameters:
remote_s3 – (dict | str) Remote s3 config argument, either dictionary or path to a json file on disk. It is not advised to enter credentials here, see the documentation in Extra Features for more details.
- property cfa_dataset: Dataset
Gets the product filehandler for the CFA dataset.
The CFA filehandler is currently read-only, and can be used to open an xarray representation of the dataset.
- property cfa_path: str
Path to the CFA object for this project.
- property dataset: KerchunkFile | GenericStore | CFADataset | None
Gets the product filehandler corresponding to cloud format.
Generic dataset property, links to the correct cloud format, given the Project’s
cloud_format
property with other configurations applied.
- property dataset_attributes: dict
Fetch a dictionary of the metadata for the dataset.
- classmethod help(func: ~typing.Callable = <built-in function print>)
Helper function to describe basic functions from this mixin
- Parameters:
func – (Callable) provide an alternative to ‘print’ function for displaying help information.
- property kfile: KerchunkFile | None
Retrieve the kfile filehandler or create if not present
- property kstore: KerchunkStore | None
Retrieve the kstore filehandler or create if not present
- remove_attribute(attribute: str, target: str = 'dataset') None
Remove an attribute within a dataset representation’s metadata.
- Parameters:
attribute – (str) The name of an attribute within the metadata property of the corresponding filehandler.
target – (str) The target product filehandler, uses the generic dataset filehandler if not otherwise specified.
- remove_s3_config()
Remove remote_s3 configuration from this project
- save_ds_filehandlers()
Save all dataset files that already exist
Product filehandlers include kerchunk files, stores (via parquet) and zarr stores. The CFA filehandler is not currently editable, so is not included here.
- update_attribute(attribute: str, value: Any, target: str = 'dataset') None
Update an attribute within a dataset representation’s metadata.
- Parameters:
attribute – (str) The name of an attribute within the metadata property of the corresponding filehandler.
value – (Any) The new value to set for this attribute.
target – (str) The target product filehandler, uses the generic dataset filehandler if not otherwise specified.
- write_to_s3(credentials: dict | str, bucket_id: str, name_ovewrite: str | None = None, dataset_type: str = 'zstore', write_as: str = 'zarr', s3_kwargs: dict = None, **zarr_kwargs) None
Write one of the active
dataset
objects to an s3 zarr store
- class padocc.core.mixins.directory.DirectoryMixin(workdir: str, groupID: str = None, forceful: bool = None, dryrun: bool = None, thorough: bool = None, logger: Logger = None, bypass: BypassSwitch = None, label: str = None, fh: str = None, logid: str = None, verbose: int = 0)
Container class for Operations which require functionality to create directories (workdir, groupdir, cache etc.)
This Mixin enables all child classes the ability to manipulate the filesystem to create new directories as required, and handles the so-called fh-kwargs, which relate to forceful overwrites of filesystem objects, skipping creation or starting from scratch, all relating to the filesystem.
This is a behavioural Mixin class and thus should not be directly accessed. Where possible, encapsulated classes should contain all relevant parameters for their operation as per convention, however this is not the case for mixin classes. The mixin classes here will explicitly state where they are designed to be used, as an extension of an existing class.
Use case: ProjectOperation, GroupOperation
- property groupdir
Group directory property
- classmethod help(func: ~typing.Callable = <built-in function print>)
No public methods
- class padocc.core.mixins.properties.PropertiesMixin
Properties relating to the ProjectOperation class that are stored separately for convenience and easier debugging.
This is a behavioural Mixin class and thus should not be directly accessed. Where possible, encapsulated classes should contain all relevant parameters for their operation as per convention, however this is not the case for mixin classes. The mixin classes here will explicitly state where they are designed to be used, as an extension of an existing class.
Use case: ProjectOperation [ONLY]
- apply_defaults(defaults: dict, target: str = 'dataset', remove: list | None = None)
Apply a default selection of attributes to a dataset.
- property cloud_format: str
Obtain the cloud format for this project.
Check multiple options from base and detail configs to find the cloud format for this project. The default is to use kerchunk.
- property complete_product: str
Return the name of the actual dataset.
Products are referred to by revision only within the project directory, but on completion these will be copied out of the pipeline, where they are renamed with the project code and revision for the actual dataset.
- property file_type: str
Return the file type for this project.
- get_stac_representation(stac_mapping: dict) dict
Apply all required substitutions to the stac representation.
- Parameters:
stac_mapping – (dict) A padocc-map-compliant dictionary for extracting properties into a dictionary for STAC record-making.
- classmethod help(func: ~typing.Callable = <built-in function print>)
Helper function displays basic functions for use.
- Parameters:
func – (Callable) provide an alternative to ‘print’ function for displaying help information.
- major_version_increment()
Increment the major X.y part of the version number.
Use this function for major changes to the cloud file - e.g. replacement of source file data.
- minor_version_increment(addition: str | None = None)
Increment the minor x.Y number for the version.
Use this function for when properties of the cloud file have been changed.
- Parameters:
addition – (str) Reason for version change; attribute change or otherwise.
- property outpath: str
Path to the output product.
Takes into account the cloud format and type. Extension is applied via the Filehandler that this string is applied to.
- property outproduct: str
File/directory name for the output product.
Revision takes into account cloud format and type where applicable.
- property revision: str
Revision takes into account cloud format and type.
- property source_format: str
Get the source format of the files.
This is determined during the scanning process. Note: This returns the driver used in the kerchunk scanning process if that step has been completed.
- property version_no: str
Get the version number from the base config file.
This property is read-only, but currently can be forcibly overwritten by editing the base config.
- class padocc.core.mixins.status.StatusMixin
Methods relating to the ProjectOperation class, in terms of determining the status of previous runs.
This is a behavioural Mixin class and thus should not be directly accessed. Where possible, encapsulated classes should contain all relevant parameters for their operation as per convention, however this is not the case for mixin classes. The mixin classes here will explicitly state where they are designed to be used, as an extension of an existing class.
Use case: ProjectOperation [ONLY]
- get_last_run() tuple
Get the tuple-value for this projects last run.
- get_last_status() str
Gets the last line of the correct log file
- get_log_contents(phase: str) str
Get the contents of the log file as a string
- Parameters:
phase – (str) Phased operation from which to pull logs.
- get_report() dict
Get the validation report if present for this project.
- classmethod help(func: ~typing.Callable = <built-in function print>)
Helper function displays basic functions for use.
- Parameters:
func – (Callable) provide an alternative to ‘print’ function for displaying help information.
- set_last_run(phase: str, time: str) None
Set the phase and time of the last run for this project.
- Parameters:
phase – (str) Phased operation of last run.
time – (str) Timestamp for operation.
- show_log_contents(phase: str, halt: bool = False, func: ~typing.Callable = <built-in function print>)
Format the contents of the log file to print.
- Parameters:
phase – (str) Phased operation to pull log data from.
halt – (bool) Stop and display log data, wait for input before continuing.
func – (Callable) provide an alternative to ‘print’ function for displaying help information.
- update_status(phase: str, status: str, jobid: str = '') None
Update the status of a project
Status updates performed via the status log filehandler, during phased operation of the pipeline.
- Parameters:
phase – (str) Phased operation being performed.
status – (str) Status of phased operation outcome
jobid – (str) ID of SLURM job in which this operation has taken place.