Phased Operators

Scan Operations

class padocc.phases.scan.ScanOperation(proj_code: str, workdir: str, groupID: str = None, label: str = 'scan', parallel: bool = False, **kwargs)

Bases: ProjectOperation

help(fn=<built-in function print>)

Public user functions for the project operator.

Parameters:: func – (Callable) provide an alternative to ‘print’ function for displaying help information.

Compute Operations

class padocc.phases.compute.ComputeOperation(proj_code: str, workdir: str, groupID: str = None, stage: str = 'in_progress', thorough: bool = None, concat_msg: str = 'See individual files for more details', limiter: int = None, skip_concat: bool = False, label: str = 'compute', is_trial: bool = False, parallel: bool = False, **kwargs)

Bases: ProjectOperation

PADOCC Dataset Processor Class, capable of processing a single dataset’s worth of input files into a single aggregated file/store.

property filelist: Quick function for obtaining a subset of the whole fileset. Originally used to open all the files using Xarray for concatenation later.

help(fn=<built-in function print>)

Public user functions for the project operator.

Parameters:: func – (Callable) provide an alternative to ‘print’ function for displaying help information.

save_files(): Save all filehandlers associated with this group.

class padocc.phases.compute.KerchunkConverter(logger=None, bypass_driver=False, verbose=1, label=None, fh=None, logid=None)

Bases: LoggedOperation

Class for converting a single file to a Kerchunk reference object. Handles known or unknown file types (NetCDF3/4 versions).

run(nfile: str, filehandler=None, extension=None, **kwargs) → dict

Safe creation allows for known issues and tries multiple drivers

Returns:: dictionary of Kerchunk references if successful, raises error otherwise if unsuccessful.

class padocc.phases.compute.KerchunkDS(proj_code, workdir, stage='in_progress', **kwargs)

Bases: ComputeOperation

create_refs(check_refs: bool = False, ctype: str | None = None, compute_subset: str | None = None, compute_total: str | None = None) → None: Organise creation and loading of refs - Load existing cached refs - Create new refs - Combine metadata and global attributes into a single set - Coordinate combining and saving of data

class padocc.phases.compute.ZarrDS(proj_code, workdir, groupID: str | None = None, stage: str = 'in_progress', mem_allowed: str = '100MB', preferences=None, **kwargs)

Bases: ComputeOperation

create_store(file_limit: int = None): Create the Zarr Store

Validate Operations

class padocc.phases.validate.PresliceSet

Bases: object

Preslice Object for handling slices applied to datasets.

apply(data_arr: DataArray, var: str) → DataArray: Apply a preslice operation to a data array

class padocc.phases.validate.Report(fh=None)

Bases: object

Special report class, capable of utilising recursive dictionary value-setting.

class padocc.phases.validate.ValidateDatasets(datasets: list, id: str, filehandlers: list[JSONFileHandler] | list[dict] | None = None, dataset_labels: list = None, preslice_fns: list = None, logger=None, label: str = None, fh: str = None, logid: str = None, verbose: int = 0)

Bases: LoggedOperation

ValidateDatasets object for performing validations between two pseudo-identical Xarray Dataset objects.

4th Dec Note: Validate metadata using single NetCDF(Xarray) vs Kerchunk Validate data using combined NetCDF or CFA vs Kerchunk (for best performance)

control_dataset_var(var): Get a variable DataArray from the control dataset, performing preslice functions.

property data_report: Read-only data report

property metadata_report: Read-only metadata report

replace_dataset(new_ds: Dataset, label: str = None, index: int = None, dstype: str = None) → None: Replace dataset by type, label or index.

replace_preslice(new_preslice: Dataset, label: str = None, index: int = None, dstype: str = None) → None: Replace dataset by type, label or index.

test_dataset_var(var): Get a variable DataArray from the test dataset, performing preslice functions.

validate_data(dim_mid: dict | None = None): Perform data validations using the growbox method for all variable DataArrays.

validate_global_attrs(allowances: dict = None): Validate the set of global attributes across all datasets

validate_metadata(allowances: dict = None) → dict: Run all validation steps on this set of datasets.

class padocc.phases.validate.ValidateOperation(proj_code, workdir, parallel: bool = False, **kwargs)

Bases: ProjectOperation

Encapsulate all validation testing into a single class. Instantiate for a specific project, the object could then contain all project info (from detail-cfg) opened only once. Also a copy of the total datasets (from native and cloud sources). Subselections can be passed between class methods along with a variable index (class variables: variable list, dimension list etc.)

Class logger attribute so this doesn’t need to be passed between functions. Bypass switch contained here with all switches.

padocc.phases.validate.check_for_nan(box, bypass, logger, label=None): Special function for assessing if a box selection has non-NaN values within it. Needs further testing using different data types.

padocc.phases.validate.mem_to_value(mem) → float

Convert a memory value i.e 2G into a value

Returns:: Int value of e.g. ‘2G’ in bytes.

padocc.phases.validate.slice_all_dims(data_arr: DataArray, intval: int, dim_mid: dict[int, None] | None = None): Slice all dimensions for the DataArray according to the integer value.

padocc.phases.validate.value_to_mem(value) → str

Convert a number of bytes i.e 1000000000 into a string

Returns:: String value of the above (1000000000 -> 1M)

Ingest Operations

Note

Not featured in this development, the ingest operations are still being planned and scoped.

padocc.phases.ingest.add_download_link(group, workdir, proj_code): Add the download link to each of the Kerchunk references

padocc.phases.ingest.ingest_config(args, logger): Configure for ingestion of a set of project codes, currently defined by a repeat_id but this could be changed later to apply to all project codes fitting some parameters