Phased Operators
Scan Operations
Compute Operations
Validate Operations
- class padocc.phases.validate.PresliceSet(logger)
Bases:
objectPreslice Object for handling slices applied to datasets.
- apply(data_arr: DataArray, var: str) DataArray
Apply a preslice operation to a data array
- class padocc.phases.validate.Report(fh: dict | object | None = None, bypass: dict = None)
Bases:
objectSpecial report class, capable of utilising recursive dictionary value-setting.
- class padocc.phases.validate.ValidateDatasets(datasets: list, id: str, filehandlers: list[JSONFileHandler] | list[dict] | None = None, dataset_labels: list | None = None, preslice_fns: list | None = None, error_bypass: dict | str | None = None, logger=None, label: str | None = None, fh: str | None = None, logid: str | None = None, verbose: int = 0, bypass_vars: list | None = None, concat_dims: list | None = None)
Bases:
LoggedOperationValidateDatasets object for performing validations between two pseudo-identical Xarray Dataset objects.
4th Dec Note: Validate metadata using single NetCDF(Xarray) vs Kerchunk Validate data using combined NetCDF or CFA vs Kerchunk (for best performance)
- control_dataset_var(var)
Get a variable DataArray from the control dataset, performing preslice functions.
- property data_report
Read-only data report
- decode_times_ok()
Determine if test and sample datasets have matching encodings.
- property metadata_report
Read-only metadata report
- replace_dataset(new_ds: Dataset, label: str = None, index: int = None, dstype: str = None) None
Replace dataset by type, label or index.
- replace_preslice(new_preslice: Dataset, label: str = None, index: int = None, dstype: str = None) None
Replace dataset by type, label or index.
- save_report(filehandler=None)
Formulate report such that it notes bypasses, and determine the worst error.
This is in addition to saving the report content.
- test_dataset_var(var)
Get a variable DataArray from the test dataset, performing preslice functions.
- validate_data(dim_mid: dict | None = None)
Perform data validations using the growbox method for all variable DataArrays.
- validate_global_attrs(allowances: dict = None)
Validate the set of global attributes across all datasets
- validate_metadata(allowances: dict = None) dict
Run all validation steps on this set of datasets.
- class padocc.phases.validate.ValidateOperation(proj_code, workdir, parallel: bool = False, **kwargs)
Bases:
ProjectOperationEncapsulate all validation testing into a single class. Instantiate for a specific project, the object could then contain all project info (from detail-cfg) opened only once. Also a copy of the total datasets (from native and cloud sources). Subselections can be passed between class methods along with a variable index (class variables: variable list, dimension list etc.)
Class logger attribute so this doesn’t need to be passed between functions. Bypass switch contained here with all switches.
- padocc.phases.validate.check_for_nan(box, bypass, logger, label=None)
Special function for assessing if a box selection has non-NaN values within it. Needs further testing using different data types.
- padocc.phases.validate.mem_to_value(mem) float
Convert a memory value i.e 2G into a value
- Returns:
Int value of e.g. ‘2G’ in bytes.
- padocc.phases.validate.slice_all_dims(data_arr: DataArray, intval: int, dim_mid: dict[int, None] | None = None)
Slice all dimensions for the DataArray according to the integer value.
- padocc.phases.validate.value_to_mem(value) str
Convert a number of bytes i.e 1000000000 into a string
- Returns:
String value of the above (1000000000 -> 1M)
Ingest Operations
Note
Not featured in this development, the ingest operations are still being planned and scoped.