GroupOperation Core and Mixin Behaviours

Source code for group operations and mixin behaviours.

class padocc.groups.group.GroupOperation(groupID: str, workdir: str = None, forceful: bool = None, dryrun: bool = None, thorough: bool = None, logger: ~logging.Logger | ~padocc.core.logs.FalseLogger = None, bypass: ~padocc.core.utils.BypassSwitch = <padocc.core.utils.BypassSwitch object>, label: str = None, fh: str = None, logid: str = None, verbose: int = 0)

complete_group(move_to: str, thorough: bool = False, repeat_id: str = 'main'): Complete all projects for a group.

get_stac_representation(stac_mapping: dict, repeat_id: str = 'main') → list: Obtain all records for all projects in this group.

classmethod help(func: ~typing.Callable = <function print_fmt_str>): No public methods

info() → dict: Obtain a dictionary of key values for this object.

save_files(): Save all files associated with this group.

class padocc.groups.mixins.allocations.AllocationsMixin

Enables the use of Allocations for job deployments via Slurm.

This is a behavioural Mixin class and thus should not be directly accessed. Where possible, encapsulated classes should contain all relevant parameters for their operation as per convention, however this is not the case for mixin classes. The mixin classes here will explicitly state where they are designed to be used, as an extension of an existing class.

Use case: GroupOperation [ONLY]

create_allocations(phase: str, repeat_id: str, band_increase: str | None = None, binpack: bool = None, **kwargs) → list

Function for assembling all allocations and bands for packing.

Allocations contain multiple processes within a single SLURM job such that the estimates for time sum to less than the time allowed for that SLURM job. Bands are single process per job, based on a default time plus any previous attempts (use –allow-band-increase flag to enable band increases with successive attempts if previous jobs timed out)

Returns:: A list of tuple objects such that each tuple represents an array to submit to slurm with the attributes (label, time, number_of_datasets). Note: The list of datasets to apply in each array job is typcially saved under proj_codes/<repeat_id>/<label>.txt (allocations use allocations/<x>.txt in place of the label)

deploy_parallel(phase: str, source: str, band_increase: str = None, forceful: bool = None, dryrun: bool = None, thorough: bool = None, verbose: int = 0, binpack: bool = None, time_allowed: str = None, memory: str = None, subset: int = None, repeat_id: str = 'main', bypass: ~padocc.core.utils.BypassSwitch = <padocc.core.utils.BypassSwitch object>, mode: str = 'kerchunk', new_version: str = None, func: ~typing.Callable = <built-in function print>) → None: Organise parallel deployment via SLURM.

padocc.groups.mixins.allocations.get_lotus_reqs(logger)

Extract Lotus config from filesystem.

Use default if no config supplied.

class padocc.groups.mixins.evaluations.EvaluationsMixin

Group Mixin for methods to evaluate the status of a group.

This is a behavioural Mixin class and thus should not be directly accessed. Where possible, encapsulated classes should contain all relevant parameters for their operation as per convention, however this is not the case for mixin classes. The mixin classes here will explicitly state where they are designed to be used, as an extension of an existing class.

Use case: GroupOperation [ONLY]

check_attribute(attribute: str, repeat_id: str = 'main', func: ~collections.abc.Callable = <built-in function print>): Check an attribute across all projects.

get_project(proj_code: str)

Get a project operation from this group

Works on string codes only.

merge_subsets(subset_list: list[str], combined_id: str, remove_after: bool = False) → None: Merge one or more of the subsets previously created

remove_by_status(status: str, phase: str | None = None, old_repeat_id: str = 'main') → None: Group projects by their status for removal from the group

remove_subset(repeat_id: str) → None

Remove a subset from the group.

Parameters:: repeat_id – (str) The repeat_id classifying the subset in this group to which this operation will apply.

repeat_by_status(status: str, new_repeat_id: str, phase: str | None = None, old_repeat_id: str = 'main') → None: Group projects by their status, to then create a new repeat ID.

summarise_data(repeat_id: str = 'main', func: ~collections.abc.Callable = <built-in function print>): Summarise data stored across all projects, mostly concatenating results from the detail-cfg files from all projects.

summarise_status(repeat_id: str = 'main', specific_phase: str | None = None, specific_error: str | None = None, long_display: bool | None = None, display_upto: int = 5, halt: bool = False, write: bool = False, fn: ~collections.abc.Callable = <built-in function print>) → None: Gives a general overview of progress within the pipeline - How many datasets currently at each stage of the pipeline - Errors within each pipeline phase - Allows for examination of error logs - Allows saving codes matching an error type into a new repeat group

class padocc.groups.mixins.initialisation.InitialisationMixin

Mixin container class for initialisation routines for groups via input files.

This is a behavioural Mixin class and thus should not be directly accessed. Where possible, encapsulated classes should contain all relevant parameters for their operation as per convention, however this is not the case for mixin classes. The mixin classes here will explicitly state where they are designed to be used, as an extension of an existing class.

Use case: GroupOperation [ONLY]

init_from_file(input_file: str, substitutions: dict = None, remote_s3: dict | str | None = None) → None

Run initialisation by loading configurations from input sources, determine input file type and use appropriate functions to instantiate group and project directories.

Parameters:: input_file – (str) Path to an input file from which to initialise the project.
Returns:: None

class padocc.groups.mixins.modifiers.ModifiersMixin

Modifiers to the group in terms of the projects associated, allows adding and removing projects.

This is a behavioural Mixin class and thus should not be directly accessed. Where possible, encapsulated classes should contain all relevant parameters for their operation as per convention, however this is not the case for mixin classes. The mixin classes here will explicitly state where they are designed to be used, as an extension of an existing class.

Use case: GroupOperation [ONLY]

add_project(config: str | dict, remote_s3: dict | str | None = None, moles_tags: bool = False)

Add a project to this group.

Parameters:

config – (str | dict) The configuration details to add new project. Can either be a path to a json file or json content directly. Can also be either a properly formatted base config file (needs proj_code, pattern etc.) or a moles_esgf input file.
moles_tags – (bool) Option for CEDA staff to integrate output from another package.

apply_pfunc(pfunc: Callable, repeat_id: str = 'main'): Apply a custom function across all projects.

catalog_ceda(final_location: str, api_key: str, collection: str, name_list: list | None = None, repeat_id: str = 'main'): Catalog all projects in the group into the ceda-cloud-projects index.

delete_group(ask: bool = True) → None: Delete the entire set of files associated with this group.

merge(group_B)

Merge group B into group A. 1. Migrate all projects from B to A and reset groupID values. 2. Combine datasets.csv 3. Combine project codes 4. Combine faultlists.

Note: This is not a class method. The self object is replaced by group_A for convenience.

remove_project(proj_code: str, ask: bool = True) → None: Remove a project from this group Steps required: 1. Remove the project directory including all internal files. 2. Remove the project code from all project files.

set_all_values(attr: str, value: Any, repeat_id: str = 'main'): Set a particular value for all projects in a group.

transfer_project(proj_code: str, receiver_group) → None: Transfer an existing project to a new group

unmerge(group_B, dataset_list: list)

Separate elements from group_A into group_B according to the list 1. Migrate projects 2. Set the datasets 3. Set the faultlists 4. Project codes (remove group B sections)

Note: This is not a class method. The self object is replaced by group_A for convenience.