Phased Operators
Scan Operations
- class padocc.phases.scan.ScanOperation(proj_code: str, workdir: str, groupID: str = None, label: str = 'scan', **kwargs)
Bases:
ProjectOperation
- help(fn=<built-in function print>)
Public user functions for the project operator.
Compute Operations
- class padocc.phases.compute.CfaDS(proj_code: str, workdir: str, groupID: str = None, stage: str = 'in_progress', thorough: bool = None, concat_msg: str = 'See individual files for more details', limiter: int = None, skip_concat: bool = False, label: str = 'compute', is_trial: bool = False, **kwargs)
Bases:
ComputeOperation
- class padocc.phases.compute.ComputeOperation(proj_code: str, workdir: str, groupID: str = None, stage: str = 'in_progress', thorough: bool = None, concat_msg: str = 'See individual files for more details', limiter: int = None, skip_concat: bool = False, label: str = 'compute', is_trial: bool = False, **kwargs)
Bases:
ProjectOperation
PADOCC Dataset Processor Class, capable of processing a single dataset’s worth of input files into a single aggregated file/store.
- property filelist
Quick function for obtaining a subset of the whole fileset. Originally used to open all the files using Xarray for concatenation later.
- help(fn=<built-in function print>)
Public user functions for the project operator.
- class padocc.phases.compute.KerchunkConverter(logger=None, bypass_driver=False, verbose=1, label=None, fh=None, logid=None)
Bases:
LoggedOperation
Class for converting a single file to a Kerchunk reference object. Handles known or unknown file types (NetCDF3/4 versions).
- run(nfile: str, filehandler=None, extension=None, **kwargs) dict
Safe creation allows for known issues and tries multiple drivers
- Returns:
dictionary of Kerchunk references if successful, raises error otherwise if unsuccessful.
- class padocc.phases.compute.KerchunkDS(proj_code, workdir, stage='in_progress', **kwargs)
Bases:
ComputeOperation
- create_refs(check_dimensions: bool = False) None
Organise creation and loading of refs - Load existing cached refs - Create new refs - Combine metadata and global attributes into a single set - Coordinate combining and saving of data
- class padocc.phases.compute.ZarrDS(proj_code, workdir, stage='in_progress', mem_allowed: str = '100MB', preferences=None, **kwargs)
Bases:
ComputeOperation
- padocc.phases.compute.cfa_handler(instance, file_limit: int | None = None)
Handle the creation of a CFA-netCDF file using the CFAPyX package
- Parameters:
instance – (obj) The reference instance of ProjectOperation from which to pull project-specific info.
file_limit – (obj) The file limit to apply to a set of files.
Validate Operations
- class padocc.phases.validate.Report(fh=None)
Bases:
object
Special report class, capable of utilising recursive dictionary value-setting.
- class padocc.phases.validate.ValidateDatasets(datasets: list, id: str, filehandlers: list[padocc.core.filehandlers.JSONFileHandler] | list[dict] | None = None, dataset_labels: list = None, preslice_fns: list = None, logger=None, label: str = None, fh: str = None, logid: str = None, verbose: int = 0)
Bases:
LoggedOperation
ValidateDatasets object for performing validations between two pseudo-identical Xarray Dataset objects.
4th Dec Note: Validate metadata using single NetCDF(Xarray) vs Kerchunk Validate data using combined NetCDF or CFA vs Kerchunk (for best performance)
- control_dataset_var(var)
Get a variable DataArray from the control dataset, performing preslice functions.
- replace_dataset(new_ds: Dataset, label: str = None, index: int = None, dstype: str = None) None
Replace dataset by type, label or index.
- replace_preslice(new_preslice: Dataset, label: str = None, index: int = None, dstype: str = None) None
Replace dataset by type, label or index.
- test_dataset_var(var)
Get a variable DataArray from the test dataset, performing preslice functions.
- validate_data()
Perform data validations using the growbox method for all variable DataArrays.
- validate_global_attrs(allowances: dict = None)
Validate the set of global attributes across all datasets
- validate_metadata(allowances: dict = None) dict
Run all validation steps on this set of datasets.
- class padocc.phases.validate.ValidateOperation(*args, **kwargs)
Bases:
ProjectOperation
Encapsulate all validation testing into a single class. Instantiate for a specific project, the object could then contain all project info (from detail-cfg) opened only once. Also a copy of the total datasets (from native and cloud sources). Subselections can be passed between class methods along with a variable index (class variables: variable list, dimension list etc.)
Class logger attribute so this doesn’t need to be passed between functions. Bypass switch contained here with all switches.
- padocc.phases.validate.check_for_nan(box, bypass, logger, label=None)
Special function for assessing if a box selection has non-NaN values within it. Needs further testing using different data types.
- padocc.phases.validate.mem_to_value(mem) float
Convert a memory value i.e 2G into a value
- Returns:
Int value of e.g. ‘2G’ in bytes.
- padocc.phases.validate.slice_all_dims(data_arr: DataArray, intval: int)
Slice all dimensions for the DataArray according to the integer value.
- padocc.phases.validate.value_to_mem(value) str
Convert a number of bytes i.e 1000000000 into a string
- Returns:
String value of the above (1000000000 -> 1M)
Ingest Operations
Note
Not featured in this development, the ingest operations are still being planned and scoped.
- padocc.phases.ingest.add_download_link(group, workdir, proj_code)
Add the download link to each of the Kerchunk references
- padocc.phases.ingest.ingest_config(args, logger)
Configure for ingestion of a set of project codes, currently defined by a repeat_id but this could be changed later to apply to all project codes fitting some parameters