Phased Operators

Scan Operations

Compute Operations

Validate Operations

class padocc.phases.validate.PresliceSet(logger)

Bases: object

Preslice Object for handling slices applied to datasets.

apply(data_arr: DataArray, var: str) DataArray

Apply a preslice operation to a data array

class padocc.phases.validate.Report(fh: dict | object | None = None, bypass: dict = None)

Bases: object

Special report class, capable of utilising recursive dictionary value-setting.

class padocc.phases.validate.ValidateDatasets(datasets: list, id: str, filehandlers: list[JSONFileHandler] | list[dict] | None = None, dataset_labels: list | None = None, preslice_fns: list | None = None, error_bypass: dict | str | None = None, logger=None, label: str | None = None, fh: str | None = None, logid: str | None = None, verbose: int = 0, bypass_vars: list | None = None, concat_dims: list | None = None)

Bases: LoggedOperation

ValidateDatasets object for performing validations between two pseudo-identical Xarray Dataset objects.

4th Dec Note: Validate metadata using single NetCDF(Xarray) vs Kerchunk Validate data using combined NetCDF or CFA vs Kerchunk (for best performance)

control_dataset_var(var)

Get a variable DataArray from the control dataset, performing preslice functions.

property data_report

Read-only data report

decode_times_ok()

Determine if test and sample datasets have matching encodings.

property metadata_report

Read-only metadata report

replace_dataset(new_ds: Dataset, label: str = None, index: int = None, dstype: str = None) None

Replace dataset by type, label or index.

replace_preslice(new_preslice: Dataset, label: str = None, index: int = None, dstype: str = None) None

Replace dataset by type, label or index.

save_report(filehandler=None)

Formulate report such that it notes bypasses, and determine the worst error.

This is in addition to saving the report content.

test_dataset_var(var)

Get a variable DataArray from the test dataset, performing preslice functions.

validate_data(dim_mid: dict | None = None)

Perform data validations using the growbox method for all variable DataArrays.

validate_global_attrs(allowances: dict = None)

Validate the set of global attributes across all datasets

validate_metadata(allowances: dict = None) dict

Run all validation steps on this set of datasets.

class padocc.phases.validate.ValidateOperation(proj_code, workdir, parallel: bool = False, **kwargs)

Bases: ProjectOperation

Encapsulate all validation testing into a single class. Instantiate for a specific project, the object could then contain all project info (from detail-cfg) opened only once. Also a copy of the total datasets (from native and cloud sources). Subselections can be passed between class methods along with a variable index (class variables: variable list, dimension list etc.)

Class logger attribute so this doesn’t need to be passed between functions. Bypass switch contained here with all switches.

padocc.phases.validate.check_for_nan(box, bypass, logger, label=None)

Special function for assessing if a box selection has non-NaN values within it. Needs further testing using different data types.

padocc.phases.validate.mem_to_value(mem) float

Convert a memory value i.e 2G into a value

Returns:

Int value of e.g. ‘2G’ in bytes.

padocc.phases.validate.slice_all_dims(data_arr: DataArray, intval: int, dim_mid: dict[int, None] | None = None)

Slice all dimensions for the DataArray according to the integer value.

padocc.phases.validate.value_to_mem(value) str

Convert a number of bytes i.e 1000000000 into a string

Returns:

String value of the above (1000000000 -> 1M)

Ingest Operations

Note

Not featured in this development, the ingest operations are still being planned and scoped.