Projects in PADOCC
To differentiate syntax of datasets/datafiles with other packages that have varying definitions of those terms,
PADOCC uses the term Project
to refer to a set of files to be aggregated into a single ‘Cloud Product’.
The ProjectOperation
class within PADOCC allows us to access all information about a specific dataset, including
fetching data from files within the pipeline directory. This class also inherits from several Mixin classes which
act as containers for specific behaviours for easier organisation and future debugging.
The Project Operator class
The ‘core’ behaviour of all classes is contained in the ProjectOperation
class.
This class has public UI methods like info
and help
that give general information about a project,
and list some of the other public methods available respectively.
Project Operator:
> project.info() - Get some information about this project
> project.get_version() - Get the version number for the output product
> project.save_files() - Save all open files related to this project
Dataset Handling:
> project.dataset - Default product Filehandler (pointer) property
> project.dataset_attributes - Fetch metadata from the default dataset
> project.kfile - Kerchunk Filehandler property
> project.kstore - Kerchunk (Parquet) Filehandler property
> project.cfa_dataset - CFA Filehandler property
> project.zstore - Zarr Filehandler property
> project.update_attribute() - Update an attribute within the metadata
Status Options:
> project.get_last_run() - Get the last performed phase and time it occurred
> project.get_last_status() - Get the status of the previous core operation.
> project.get_log_contents() - Get the log contents of a previous core operation
Extra Properties:
> project.outpath - path to the output product (Kerchunk/Zarr)
> project.outproduct - name of the output product (minus extension)
> project.revision - Revision identifier (major + minor version plus type indicator)
> project.version_no - Get major + minor version identifier
> project.cloud_format[EDITABLE] - Cloud format (Kerchunk/Zarr) for this project
> project.file_type[EDITABLE] - The file type to use (e.g JSON/parq for kerchunk).
> project.source_format - Get the driver used by kerchunk
> project.get_stac_representation() - Provide a mapper, fills with values from the project to create a STAC record.
- Key Functions:
Acts as an access point to all information and data about a project (dataset).
Can adjust values within key files (abstracted) by setting specific parameters of the project instance and then using
save_files
.Enables quick stats gathering for use with group statistics calculations.
Can run any process on a project from the Project Operator.