GroupOperation Core and Mixin Behaviours

Source code for group operations and mixin behaviours.

class padocc.groups.group.GroupOperation(groupID: str, workdir: str = None, forceful: bool = None, dryrun: bool = None, thorough: bool = None, logger: ~logging.Logger | ~padocc.core.logs.FalseLogger = None, bypass: ~padocc.core.utils.BypassSwitch = <padocc.core.utils.BypassSwitch object>, label: str = None, fh: str = None, logid: str = None, verbose: int = 0)
complete_group(move_to: str, repeat_id: str = 'main')

Complete all projects for a group.

get_stac_representation(stac_mapping: dict, repeat_id: str = 'main') list

Obtain all records for all projects in this group.

classmethod help(func: ~typing.Callable = <function print_fmt_str>)

No public methods

info() dict

Obtain a dictionary of key values for this object.

save_files()

Save all files associated with this group.

class padocc.groups.mixins.allocations.AllocationsMixin

Enables the use of Allocations for job deployments via Slurm.

This is a behavioural Mixin class and thus should not be directly accessed. Where possible, encapsulated classes should contain all relevant parameters for their operation as per convention, however this is not the case for mixin classes. The mixin classes here will explicitly state where they are designed to be used, as an extension of an existing class.

Use case: GroupOperation [ONLY]

create_allocations(phase: str, repeat_id: str, band_increase: str | None = None, binpack: bool = None, **kwargs) list

Function for assembling all allocations and bands for packing.

Allocations contain multiple processes within a single SLURM job such that the estimates for time sum to less than the time allowed for that SLURM job. Bands are single process per job, based on a default time plus any previous attempts (use –allow-band-increase flag to enable band increases with successive attempts if previous jobs timed out)

Returns:

A list of tuple objects such that each tuple represents an array to submit to slurm with the attributes (label, time, number_of_datasets). Note: The list of datasets to apply in each array job is typcially saved under proj_codes/<repeat_id>/<label>.txt (allocations use allocations/<x>.txt in place of the label)

deploy_parallel(phase: str, source: str, band_increase: str = None, forceful: bool = None, dryrun: bool = None, thorough: bool = None, verbose: int = 0, binpack: bool = None, time_allowed: str = None, memory: str = None, subset: int = None, repeat_id: str = 'main', bypass: ~padocc.core.utils.BypassSwitch = <padocc.core.utils.BypassSwitch object>, mode: str = 'kerchunk', new_version: str = None, func: ~typing.Callable = <built-in function print>) None

Organise parallel deployment via SLURM.

padocc.groups.mixins.allocations.get_lotus_reqs(logger)

Extract Lotus config from filesystem.

Use default if no config supplied.

class padocc.groups.mixins.evaluations.EvaluationsMixin

Group Mixin for methods to evaluate the status of a group.

This is a behavioural Mixin class and thus should not be directly accessed. Where possible, encapsulated classes should contain all relevant parameters for their operation as per convention, however this is not the case for mixin classes. The mixin classes here will explicitly state where they are designed to be used, as an extension of an existing class.

Use case: GroupOperation [ONLY]

check_attribute(attribute: str, repeat_id: str = 'main', func: ~collections.abc.Callable = <built-in function print>)

Check an attribute across all projects.

get_project(proj_code: str)

Get a project operation from this group

merge_subsets(subset_list: list[str], combined_id: str, remove_after: bool = False) None

Merge one or more of the subsets previously created

remove_by_status(status: str, phase: str | None = None, old_repeat_id: str = 'main') None

Group projects by their status for removal from the group

remove_subset(repeat_id: str) None

Remove a subset from the group.

Parameters:

repeat_id – (str) The repeat_id classifying the subset in this group to which this operation will apply.

repeat_by_status(status: str, new_repeat_id: str, phase: str | None = None, old_repeat_id: str = 'main') None

Group projects by their status, to then create a new repeat ID.

summarise_data(repeat_id: str = 'main', func: ~collections.abc.Callable = <built-in function print>)

Summarise data stored across all projects, mostly concatenating results from the detail-cfg files from all projects.

summarise_status(repeat_id: str = 'main', specific_phase: str | None = None, specific_error: str | None = None, long_display: bool | None = None, display_upto: int = 5, halt: bool = False, write: bool = False, fn: ~collections.abc.Callable = <built-in function print>) None

Gives a general overview of progress within the pipeline - How many datasets currently at each stage of the pipeline - Errors within each pipeline phase - Allows for examination of error logs - Allows saving codes matching an error type into a new repeat group

class padocc.groups.mixins.initialisation.InitialisationMixin

Mixin container class for initialisation routines for groups via input files.

This is a behavioural Mixin class and thus should not be directly accessed. Where possible, encapsulated classes should contain all relevant parameters for their operation as per convention, however this is not the case for mixin classes. The mixin classes here will explicitly state where they are designed to be used, as an extension of an existing class.

Use case: GroupOperation [ONLY]

init_from_file(input_file: str, substitutions: dict = None, remote_s3: dict | str | None = None) None

Run initialisation by loading configurations from input sources, determine input file type and use appropriate functions to instantiate group and project directories.

Parameters:

input_file – (str) Path to an input file from which to initialise the project.

Returns:

None

class padocc.groups.mixins.modifiers.ModifiersMixin

Modifiers to the group in terms of the projects associated, allows adding and removing projects.

This is a behavioural Mixin class and thus should not be directly accessed. Where possible, encapsulated classes should contain all relevant parameters for their operation as per convention, however this is not the case for mixin classes. The mixin classes here will explicitly state where they are designed to be used, as an extension of an existing class.

Use case: GroupOperation [ONLY]

add_project(config: str | dict, remote_s3: dict | str | None = None, moles_tags: bool = False)

Add a project to this group.

Parameters:
  • config – (str | dict) The configuration details to add new project. Can either be a path to a json file or json content directly. Can also be either a properly formatted base config file (needs proj_code, pattern etc.) or a moles_esgf input file.

  • moles_tags – (bool) Option for CEDA staff to integrate output from another package.

merge(group_B)

Merge group B into group A. 1. Migrate all projects from B to A and reset groupID values. 2. Combine datasets.csv 3. Combine project codes 4. Combine faultlists.

Note: This is not a class method. The self object is replaced by group_A for convenience.

remove_project(proj_code: str, ask: bool = True) None

Remove a project from this group Steps required: 1. Remove the project directory including all internal files. 2. Remove the project code from all project files.

transfer_project(proj_code: str, receiver_group) None

Transfer an existing project to a new group

unmerge(group_B, dataset_list: list)

Separate elements from group_A into group_B according to the list 1. Migrate projects 2. Set the datasets 3. Set the faultlists 4. Project codes (remove group B sections)

Note: This is not a class method. The self object is replaced by group_A for convenience.