GroupOperation Core and Mixin Behaviours
Source code for group operations and mixin behaviours.
- class padocc.groups.group.GroupOperation(groupID: str, workdir: str = None, forceful: bool = None, dryrun: bool = None, thorough: bool = None, logger: ~logging.Logger | ~padocc.core.logs.FalseLogger = None, bypass: ~padocc.core.utils.BypassSwitch = <padocc.core.utils.BypassSwitch object>, label: str = None, fh: str = None, logid: str = None, verbose: int = 0)
- complete_group(move_to: str, repeat_id: str = 'main')
Complete all projects for a group.
- get_stac_representation(stac_mapping: dict, repeat_id: str = 'main') list
Obtain all records for all projects in this group.
- classmethod help(func: ~typing.Callable = <function print_fmt_str>)
No public methods
- info() dict
Obtain a dictionary of key values for this object.
- save_files()
Save all files associated with this group.
- class padocc.groups.mixins.allocations.AllocationsMixin
Enables the use of Allocations for job deployments via Slurm.
This is a behavioural Mixin class and thus should not be directly accessed. Where possible, encapsulated classes should contain all relevant parameters for their operation as per convention, however this is not the case for mixin classes. The mixin classes here will explicitly state where they are designed to be used, as an extension of an existing class.
Use case: GroupOperation [ONLY]
- create_allocations(phase: str, repeat_id: str, band_increase: str | None = None, binpack: bool = None, **kwargs) list
Function for assembling all allocations and bands for packing.
Allocations contain multiple processes within a single SLURM job such that the estimates for time sum to less than the time allowed for that SLURM job. Bands are single process per job, based on a default time plus any previous attempts (use –allow-band-increase flag to enable band increases with successive attempts if previous jobs timed out)
- Returns:
A list of tuple objects such that each tuple represents an array to submit to slurm with the attributes (label, time, number_of_datasets). Note: The list of datasets to apply in each array job is typcially saved under proj_codes/<repeat_id>/<label>.txt (allocations use allocations/<x>.txt in place of the label)
- deploy_parallel(phase: str, source: str, band_increase: str = None, forceful: bool = None, dryrun: bool = None, thorough: bool = None, verbose: int = 0, binpack: bool = None, time_allowed: str = None, memory: str = None, subset: int = None, repeat_id: str = 'main', bypass: ~padocc.core.utils.BypassSwitch = <padocc.core.utils.BypassSwitch object>, mode: str = 'kerchunk', new_version: str = None, func: ~typing.Callable = <built-in function print>) None
Organise parallel deployment via SLURM.
- padocc.groups.mixins.allocations.get_lotus_reqs(logger)
Extract Lotus config from filesystem.
Use default if no config supplied.
- class padocc.groups.mixins.evaluations.EvaluationsMixin
Group Mixin for methods to evaluate the status of a group.
This is a behavioural Mixin class and thus should not be directly accessed. Where possible, encapsulated classes should contain all relevant parameters for their operation as per convention, however this is not the case for mixin classes. The mixin classes here will explicitly state where they are designed to be used, as an extension of an existing class.
Use case: GroupOperation [ONLY]
- check_attribute(attribute: str, repeat_id: str = 'main', func: ~collections.abc.Callable = <built-in function print>)
Check an attribute across all projects.
- get_project(proj_code: str)
Get a project operation from this group
- merge_subsets(subset_list: list[str], combined_id: str, remove_after: bool = False) None
Merge one or more of the subsets previously created
- remove_by_status(status: str, phase: str | None = None, old_repeat_id: str = 'main') None
Group projects by their status for removal from the group
- remove_subset(repeat_id: str) None
Remove a subset from the group.
- Parameters:
repeat_id – (str) The repeat_id classifying the subset in this group to which this operation will apply.
- repeat_by_status(status: str, new_repeat_id: str, phase: str | None = None, old_repeat_id: str = 'main') None
Group projects by their status, to then create a new repeat ID.
- summarise_data(repeat_id: str = 'main', func: ~collections.abc.Callable = <built-in function print>)
Summarise data stored across all projects, mostly concatenating results from the detail-cfg files from all projects.
- summarise_status(repeat_id: str = 'main', specific_phase: str | None = None, specific_error: str | None = None, long_display: bool | None = None, display_upto: int = 5, halt: bool = False, write: bool = False, fn: ~collections.abc.Callable = <built-in function print>) None
Gives a general overview of progress within the pipeline - How many datasets currently at each stage of the pipeline - Errors within each pipeline phase - Allows for examination of error logs - Allows saving codes matching an error type into a new repeat group
- class padocc.groups.mixins.initialisation.InitialisationMixin
Mixin container class for initialisation routines for groups via input files.
This is a behavioural Mixin class and thus should not be directly accessed. Where possible, encapsulated classes should contain all relevant parameters for their operation as per convention, however this is not the case for mixin classes. The mixin classes here will explicitly state where they are designed to be used, as an extension of an existing class.
Use case: GroupOperation [ONLY]
- init_from_file(input_file: str, substitutions: dict = None, remote_s3: dict | str | None = None) None
Run initialisation by loading configurations from input sources, determine input file type and use appropriate functions to instantiate group and project directories.
- Parameters:
input_file – (str) Path to an input file from which to initialise the project.
- Returns:
None
- class padocc.groups.mixins.modifiers.ModifiersMixin
Modifiers to the group in terms of the projects associated, allows adding and removing projects.
This is a behavioural Mixin class and thus should not be directly accessed. Where possible, encapsulated classes should contain all relevant parameters for their operation as per convention, however this is not the case for mixin classes. The mixin classes here will explicitly state where they are designed to be used, as an extension of an existing class.
Use case: GroupOperation [ONLY]
- add_project(config: str | dict, remote_s3: dict | str | None = None, moles_tags: bool = False)
Add a project to this group.
- Parameters:
config – (str | dict) The configuration details to add new project. Can either be a path to a json file or json content directly. Can also be either a properly formatted base config file (needs
proj_code
,pattern
etc.) or a moles_esgf input file.moles_tags – (bool) Option for CEDA staff to integrate output from another package.
- merge(group_B)
Merge group B into group A. 1. Migrate all projects from B to A and reset groupID values. 2. Combine datasets.csv 3. Combine project codes 4. Combine faultlists.
Note: This is not a class method. The
self
object is replaced bygroup_A
for convenience.
- remove_project(proj_code: str, ask: bool = True) None
Remove a project from this group Steps required: 1. Remove the project directory including all internal files. 2. Remove the project code from all project files.
- transfer_project(proj_code: str, receiver_group) None
Transfer an existing project to a new group
- unmerge(group_B, dataset_list: list)
Separate elements from group_A into group_B according to the list 1. Migrate projects 2. Set the datasets 3. Set the faultlists 4. Project codes (remove group B sections)
Note: This is not a class method. The
self
object is replaced bygroup_A
for convenience.