extraction_methods.plugins package

Subpackages

Submodules

extraction_methods.plugins.bbox module

Bounding Box Method

class extraction_methods.plugins.bbox.BboxExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: bbox

Description:: Converts a coordinate values to RFC 7946, section 5 formatted bbox.

Configuration Options: .. list-table:

- ``west``: ``REQUIRED`` Most westerly coordinate
- ``south``: ``REQUIRED`` Most southernly coordinate
- ``east``: ``REQUIRED`` Most easterly coordinate
- ``north``: ``REQUIRED`` Most northernly coordinate

Example Configuration: .. code-block:: yaml

method: bbox inputs:

west: 0 south: 0 east: $east_variable north: $north_variable

input_class: alias of BboxInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.bbox.BboxInput(*, exists_key: str = '$', exists_delimiter: str = '.', west: float | str, south: float | str, east: float | str, north: float | str)

Bases: Input

Model for BBox Method Input.

east: float | str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'east': FieldInfo(annotation=Union[float, str], required=True, description='east coordinate.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'north': FieldInfo(annotation=Union[float, str], required=True, description='north coordinate.'), 'south': FieldInfo(annotation=Union[float, str], required=True, description='south coordinate.'), 'west': FieldInfo(annotation=Union[float, str], required=True, description='west coordinate.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

north: float | str

south: float | str

west: float | str

extraction_methods.plugins.ceda_observation module

CEDA Observation Method

class extraction_methods.plugins.ceda_observation.CEDAObservationExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: ceda_observation

Description:: Returns a ceda observation record for the input_term.

Configuration Options: .. list-table:

- ``input_term``: ``REQUIRED`` term for method to run on

Example Configuration: .. code-block:: yaml

method: ceda_observation inputs:

input_term: $url

input_class: alias of CEDAObservationInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.ceda_observation.CEDAObservationInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', request_timeout: int = 15, output_key: str = 'uuid')

Bases: Input

Model for CEDA Observation Method Input.

input_term: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'output_key': FieldInfo(annotation=str, required=False, default='uuid', description='key to output to.'), 'request_timeout': FieldInfo(annotation=int, required=False, default=15, description='request time out.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str

request_timeout: int

extraction_methods.plugins.ceda_vocabulary module

CEDA Vocabulary Method

class extraction_methods.plugins.ceda_vocabulary.CEDAVocabularyExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: ceda_vocabulary

Description:: Validates and sorts properties into vocabs and generates the general vocab for specified properties.

Configuration Options: .. list-table:

- ``url``: ``REQUIRED`` url of vocabulary server
- ``namespace``: ``REQUIRED`` namespace of vocab for terms
- ``terms``: Terms to be validated
- ``strict``: Boolean on whether values should be validated
- ``request_timeout``: request time out

Example configuration: .. code-block:: yaml

method: ceda_vocabulary inputs:

url: vocab.ceda.ac.uk namespace: cmip6 strict: False terms:

start_time

model

input_class: alias of CEDAVocabularyInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.ceda_vocabulary.CEDAVocabularyInput(*, exists_key: str = '$', exists_delimiter: str = '.', url: str, namespace: str, strict: bool = False, terms: list[str] = [], request_timeout: int = 15)

Bases: Input

Model for CEDA Vocab Method Input.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'namespace': FieldInfo(annotation=str, required=True, description='Namespace for vocab terms.'), 'request_timeout': FieldInfo(annotation=int, required=False, default=15, description='request time out.'), 'strict': FieldInfo(annotation=bool, required=False, default=False, description='True if values should be validated.'), 'terms': FieldInfo(annotation=list[str], required=False, default=[], description='terms to be validated.'), 'url': FieldInfo(annotation=str, required=True, description='URL of vocabulary server.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

namespace: str

request_timeout: int

strict: bool

terms: list[str]

url: str

extraction_methods.plugins.controlled_vocabulary module

Controlled Vocabulary Method

class extraction_methods.plugins.controlled_vocabulary.ControlledVocabularyExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: controlled_vocabulary

Description:: Compare properties to a controlled vocabulary defined by a pydantic.BaseModel.

Configuration Options: .. list-table:

- ``model``: pydantic.BaseModel subclass to be imported at run-time, e.g. `package.module.class_name`
- ``strict``: If True, raise ValidationError, otherwise simply log ValidationError messages

Example Configuration:

- name: controlled_vocabulary
  inputs:
    model: my_cv.collections.CMIP5
    strict: False

input_class: alias of ControlledVocabularyInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.controlled_vocabulary.ControlledVocabularyInput(*, exists_key: str = '$', exists_delimiter: str = '.', model: str, strict: bool = False)

Bases: Input

Model for Contrilled Vocabulary Method Input.

model: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'model': FieldInfo(annotation=str, required=True, description='pydantic.BaseModel subclass to be imported at run-time, e.g. `package.module.class_name`.'), 'strict': FieldInfo(annotation=bool, required=False, default=False, description='If True, raise ValidationError, otherwise simply log ValidationError messages.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

strict: bool

extraction_methods.plugins.datetime_bound_to_centroid module

Datetime Bound to Centroid Method

class extraction_methods.plugins.datetime_bound_to_centroid.DatetimeBoundToCentroidExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: datetime_bound_to_centroid

Description:: Accepts a dictionary of coordinate values and converts to RFC 7946, section 5 formatted bbox.

Configuration Options: .. list-table:

- ``start_datetime``: Start datetime bound
- ``start_format``: Format of the start datetime
- ``end_datetime``: End datetime bound
- ``end_format``: Format of the end datetime
- ``output_key``: Term for method to output to
- ``output_format``: Format of the output datetime

Example Configuration: .. code-block:: yaml

method: datetime_bound_to_centroid inputs:

start_datetime: $start_date end_datetime: 2022-02-02 end_format: %Y-%m-%d output_key: polygon

input_class: alias of DatetimeBoundToCentroidInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

strip_time(datetime_str: str, datetime_format: str) → datetime

strip datetime from value.

Parameters:

datetime_str (str) – string to convert to datetime
datetime_format (str) – format of datetime string

Returns:

datetime object

Return type:

datetime

class extraction_methods.plugins.datetime_bound_to_centroid.DatetimeBoundToCentroidInput(*, exists_key: str = '$', exists_delimiter: str = '.', start_datetime: str = '$start_datetime', start_format: str = '%Y-%m-%dT%H:%M:%S', end_datetime: str = '$end_datetime', end_format: str = '%Y-%m-%dT%H:%M:%S', output_key: str = 'datetime', output_format: str = '%Y-%m-%dT%H:%M:%SZ')

Bases: Input

Model for Datetime Bound to Centroid Method Input.

end_datetime: str

end_format: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'end_datetime': FieldInfo(annotation=str, required=False, default='$end_datetime', description='End datetime bound.'), 'end_format': FieldInfo(annotation=str, required=False, default='%Y-%m-%dT%H:%M:%S', description='Format of end datetime.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'output_format': FieldInfo(annotation=str, required=False, default='%Y-%m-%dT%H:%M:%SZ', description='format of output.'), 'output_key': FieldInfo(annotation=str, required=False, default='datetime', description='key to output to.'), 'start_datetime': FieldInfo(annotation=str, required=False, default='$start_datetime', description='Start datetime bound.'), 'start_format': FieldInfo(annotation=str, required=False, default='%Y-%m-%dT%H:%M:%S', description='Format for start datetime.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_format: str

output_key: str

start_datetime: str

start_format: str

extraction_methods.plugins.default module

Default Method

class extraction_methods.plugins.default.DefaultExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: default

Description:: Takes a set of default facets.

Configuration Options: .. list-table:

- ``defaults``: Dictionary of defaults to be added

Example configuration: .. code-block:: yaml

method: default inputs:

defaults:
mip_era: CMIP6

input_class: alias of DefaultInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.default.DefaultInput(*, exists_key: str = '$', exists_delimiter: str = '.', defaults: dict[str, Any])

Bases: Input

Model for Default Method Input.

defaults: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'defaults': FieldInfo(annotation=dict[str, Any], required=True, description='Defaults to be added.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

extraction_methods.plugins.dict_aggregator module

Dictionary Aggregator Method

class extraction_methods.plugins.dict_aggregator.DictAggregatorExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: dict_aggregator

Description:: Aggregate information within dictionary.

Configuration Options: .. list-table:

- ``min``: list of terms for which the minimum of their aggregate should be returned
- ``max``: list of terms for which the maximum of their aggregate should be returned
- ``sum``: list of terms for which the sum of their aggregate should be returned
- ``list``: list of terms for which a list of their aggregage should be returned
- ``mean``: list of terms for which a list of their aggregage should be returned

Configuration Example: .. code-block:: yaml

method: dict_aggregator inputs:

min:

start_time

max:

end_time

sum:

size

list:

term1

term2

input_class: alias of DictAggregatorInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.dict_aggregator.DictAggregatorInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str | dict[str, Any] = '$assets', min: list[KeyOutputKey] = [], max: list[KeyOutputKey] = [], sum: list[KeyOutputKey] = [], mean: list[KeyOutputKey] = [], bucket: list[KeyOutputKey] = [])

Bases: Input

Model for Dictionary Aggregator Method Input.

bucket: list[KeyOutputKey]

input_term: str | dict[str, Any]

max: list[KeyOutputKey]

mean: list[KeyOutputKey]

min: list[KeyOutputKey]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'bucket': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the list of their aggregate should be returned.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=Union[str, dict[str, Any]], required=False, default='$assets', description='term for method to run on.'), 'max': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the maximum of their aggregate should be returned.'), 'mean': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the mean of their summed aggregate should be returned.'), 'min': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the minimum of their aggregate should be returned.'), 'sum': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the sum of their aggregate should be returned.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

sum: list[KeyOutputKey]

extraction_methods.plugins.elasticsearch_aggregation module

Elasticsearch Aggregation Method

class extraction_methods.plugins.elasticsearch_aggregation.ElasticsearchAggregationExtract(**kwargs: Any)

Bases: ExtractionMethod

Method: elasticsearch_aggregation

Description:: Using an ID. Generate a summary of information for higher level entities.

Configuration Options: .. list-table:

- ``index``: Name of the index holding the STAC entities
- ``id_term``: Term used for agregating the STAC entities
- ``client_kwargs``: Session parameters passed to
`elasticsearch.Elasticsearch<https://elasticsearch-py.readthedocs.io/en/7.10.0/api.html>`_
- ``bbox``: list of terms for which their aggregate bbox should be returned
- ``min``: list of terms for which the minimum of their aggregate should be returned
- ``max``: list of terms for which the maximum of their aggregate should be returned
- ``sum``: list of terms for which the sum of their aggregate should be returned
- ``list``: list of terms for which a list of their aggregage should be returned

Configuration Example: .. code-block:: yaml

method: elasticsearch_aggregation inputs:

index: ceda-index id_term: item_id client_kwargs:

hosts: [‘host1:9200’,’host2:9200’]

bbox:

bbox

min:

start_time

max:

end_time

sum:

size

list:

term1

term2

base_query() → dict[str, Any]

Base query to filter the results to a single collection.

Returns:: base query
Return type:: dict

static basic_aggregation(agg_type: str, facet: KeyOutputKey) → dict[str, Any]

Query to retrieve the minimum value from docs.

Parameters:

agg_type (str) – type of aggregation
facet (KeyOutputKey) – facet to aggregate

Returns:

basic aggregation query

Return type:

dict

construct_query() → dict[str, Any]

Function to create the initial elasticsearch query.

Returns:: aggregation query
Return type:: dict

extract_facet(aggregations: dict[str, Any], facet: KeyOutputKey) → Any

Function to extract the given facets from the aggregation.

Parameters:

input_dict (dict) – aggregations
facet – facet to be extracted

Returns:

extracted facet

Return type:

Any

extract_facet_lists(query: dict[str, Any], aggregations: dict[str, Any], facets: list[KeyOutputKey]) → dict[str, Any]

Function to extract the lists of given facets from the aggregation.

Parameters:

query (dict) – attribute dictionary to update
aggregations (dict) – current generated properties
facets (list) – facets to be extracted

Returns:

extracted list facets

Return type:

dict

extract_first_facet(properties: dict[str, Any], facet: KeyOutputKey) → Any

Function to extract the given default facets from the first hit.

Parameters:

properties (dict) – properties from first record
facet – current facet to be extracted

Returns:

extracted facet

Return type:

Any

extract_metadata(query: dict[str, Any], result: dict[str, Any]) → dict[str, Any]

Function to extract the required metadata from the returned query result.

Parameters:

query (dict) – previous query
result (dict) – resutls from previous query

Returns:

metadata

Return type:

dict

static facet_composite_aggregation(facet: KeyOutputKey) → dict[str, Any]

Generate the composite aggregation for the facet.

Parameters:: facet (KeyOutputKey) – facet to aggregate
Returns:: composite aggregation query
Return type:: dict

run(body: dict[str, Any]) → dict[str, Any]

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.elasticsearch_aggregation.ElasticsearchAggregationInput(*, exists_key: str = '$', exists_delimiter: str = '.', index: str, id_term: str, client_kwargs: dict[str, Any] = {}, search_query: dict[str, Any] = {'bool': {'must': [{'term': {'path': {'value': '$uri'}}}], 'must_not': [{'term': {'categories.keyword': {'value': 'hidden'}}}]}}, geo_bound: list[KeyOutputKey] = [], first: list[KeyOutputKey] = [], min: list[KeyOutputKey] = [], max: list[KeyOutputKey] = [], sum: list[KeyOutputKey] = [], mean: list[KeyOutputKey] = [], bucket: list[KeyOutputKey] = [], request_tiemout: int = 15, allow_multiple: bool = True, output_key: str = 'label')

Bases: Input

Model for Elasticsearch Aggregation Input.

allow_multiple: bool

bucket: list[KeyOutputKey]

client_kwargs: dict[str, Any]

first: list[KeyOutputKey]

geo_bound: list[KeyOutputKey]

id_term: str

index: str

max: list[KeyOutputKey]

mean: list[KeyOutputKey]

min: list[KeyOutputKey]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'allow_multiple': FieldInfo(annotation=bool, required=False, default=True, description='True if multiple labels are allowed.'), 'bucket': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the list of their aggregate should be returned.'), 'client_kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='Parameters passed to elasticsearch client.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'first': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description="list of terms for which the first record's value should be returned."), 'geo_bound': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the minimum of their aggregate should be returned.'), 'id_term': FieldInfo(annotation=str, required=True, description='Term used for agregating the STAC entities.'), 'index': FieldInfo(annotation=str, required=True, description='Name of the index holding the STAC entities.'), 'max': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the maximum of their aggregate should be returned.'), 'mean': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the mean of their summed aggregate should be returned.'), 'min': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the minimum of their aggregate should be returned.'), 'output_key': FieldInfo(annotation=str, required=False, default='label', description='key to output to.'), 'request_tiemout': FieldInfo(annotation=int, required=False, default=15, description='Time out for search.'), 'search_query': FieldInfo(annotation=dict[str, Any], required=False, default={'bool': {'must_not': [{'term': {'categories.keyword': {'value': 'hidden'}}}], 'must': [{'term': {'path': {'value': '$uri'}}}]}}, description='Session parameters passed to elasticsearch client.'), 'sum': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the sum of their aggregate should be returned.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str

request_tiemout: int

search_query: dict[str, Any]

sum: list[KeyOutputKey]

extraction_methods.plugins.general_function module

General Function Method

class extraction_methods.plugins.general_function.Function(*, name: str, args: list[Any] = [], kwargs: dict[str, Any] = {})

Bases: BaseModel

Model for Fuction.

args: list[Any]

kwargs: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'args': FieldInfo(annotation=list[Any], required=False, default=[], description='list of arguments for function.'), 'kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='dictionary of key word arguments for function.'), 'name': FieldInfo(annotation=str, required=True, description='Name of function.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

name: str

class extraction_methods.plugins.general_function.GeneralFunctionExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: general_function

Description:: Accepts a dictionary. String values are popped from the dictionary and are put back into the dictionary with the key specified.

Configuration Options: .. list-table:

- ``function``: ``REQUIRED`` Function to be run ``name``, ``args``, and ``kwargs``.
- ``delimiter``: Optional text delimiter to put between module/function
                names ``Default`` "."
- ``output_key``: Optional name of the key you would like to output else
                  response will be merged.

Example Configuration: .. code-block:: yaml

method: general_function inputs:

funtion:
name: import.path.to.the.fuction args:

hello

world

kwargs:
hello: world foo: bar

input_class: alias of GeneralFunctionInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.general_function.GeneralFunctionInput(*, exists_key: str = '$', exists_delimiter: str = '.', function: Function, delimiter: str = '.', output_key: str = '')

Bases: Input

Model for General Fuction Input.

delimiter: str

function: Function

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'delimiter': FieldInfo(annotation=str, required=False, default='.', description='text delimiter to put between module/function names.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'function': FieldInfo(annotation=Function, required=True, description='Function to be run name maybe seperatated my delimieter.'), 'output_key': FieldInfo(annotation=str, required=False, default='', description='key to output to, else response will be merged with body.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str

extraction_methods.plugins.geometry module

Geometry Method

class extraction_methods.plugins.geometry.GeometryExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: geometry

Description:: Accepts a dictionary of coordinate values and converts to RFC 7946, formatted geometry.

Configuration Options: .. list-table:

- ``type``: ``REQUIRED`` Type of geometry to be produced.
- ``coordinates``: ``REQUIRED`` list of coordinates to convert to geometry. Ordering is respected.
- ``output_key``: key to output to.

Example Configuration: .. code-block:: yaml

name: geometry inputs:

type: line coordinates:

0

0

$lon_2

$lat_2

get_coordinates(coordinate_type: str, coordinates: list[Any]) → list[Any]

Get coordinates

Parameters:

coordinate_type (str) – type of coordinates
coordinates (list) – list of coordinates

Returns:

coordinates

Return type:

list

input_class: alias of GeometryInput

line(coordinates: list[list[str | float]]) → list[list[float]]

Get line coordinates

Parameters:: coordinates (list) – list of coordinates
Returns:: coordinates
Return type:: list

multi(coordinate_type: str, coordinates: list[Any]) → list[Any]

Get polygon coordinates

Parameters:

coordinate_type (str) – type of coordinates
coordinates (list) – list of coordinates

Returns:

coordinates

Return type:

list

point(coordinates: list[str | float]) → list[float]

Get point coordinates

Parameters:: coordinates (list) – list of coordinates
Returns:: coordinates
Return type:: list

polygon(coordinates: list[list[str | float]]) → list[list[list[float]]]

Get polygon coordinates

Parameters:: coordinates (list) – list of coordinates
Returns:: coordinates
Return type:: list

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.geometry.GeometryInput(*, exists_key: str = '$', exists_delimiter: str = '.', type: Literal['Point', 'LineString', 'Polygon', 'MultiPointString', 'MultiLineString', 'MultiPolygon'], coordinates: list[Any], output_key: str = 'geometry')

Bases: Input

Model for Geometry Input.

coordinates: list[Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'coordinates': FieldInfo(annotation=list[Any], required=True, description='list of coordinates to convert to geometry. Ordering is respected.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'output_key': FieldInfo(annotation=str, required=False, default='geometry', description='key to output to.'), 'type': FieldInfo(annotation=Literal[str, str, str, str, str, str], required=True, description='Type of geometry to be produced.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str

type: Literal['Point', 'LineString', 'Polygon', 'MultiPointString', 'MultiLineString', 'MultiPolygon']

extraction_methods.plugins.geometry_to_bbox module

Geometry to Bounding Box Method

class extraction_methods.plugins.geometry_to_bbox.GeometryToBboxExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: geometry_to_bbox

Description:: Accepts a geometry with type and list of coordinates to RFC 7946, section 5 formatted bbox.

Configuration Options: .. list-table:

- ``geometry``: ``REQUIRED`` geometry to be converted to bbox.
- ''output_key'': key to output to.

Example Configuration: .. code-block:: yaml

method: geometry_to_bbox inputs:

geometry:
type: point coordinates:

20

0

get_bbox(coordinate_type: str, coordinates: list[Any]) → list[float]

Get bbox from geometry

Parameters:

coordinate_type (str) – type of coordinates
coordinates (list) – list of coordinates

Returns:

bounding box of coordinates

Return type:

list

input_class: alias of GeometryToBboxInput

line(coordinates: list[list[float]]) → list[float]

Get line bbox

Parameters:: coordinates (list) – list of coordinates
Returns:: bounding box of coordinates
Return type:: list

multi(coordinate_type: str, coordinates: list[Any]) → list[float]

Get polygon bbox

Parameters:

coordinate_type (str) – type of coordinates
coordinates (list) – list of coordinates

Returns:

bounding box of coordinates

Return type:

list

point(coordinates: list[float]) → list[float]

Get point bbox

Parameters:: coordinates (list) – list of coordinates
Returns:: bounding box of coordinates
Return type:: list

polygon(coordinates: list[list[float]]) → list[float]

Get polygon bbox

Parameters:: coordinates (list) – list of coordinates
Returns:: bounding box of coordinates
Return type:: list

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.geometry_to_bbox.GeometryToBboxInput(*, exists_key: str = '$', exists_delimiter: str = '.', geometry: dict[str, Any] = '$geometry', output_key: str = 'bbox')

Bases: Input

Model for Geometry to Bounding Box Input.

geometry: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'geometry': FieldInfo(annotation=dict[str, Any], required=False, default='$geometry', description='geometry to be converted to bbox.'), 'output_key': FieldInfo(annotation=str, required=False, default='bbox', description='key to output to.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str

extraction_methods.plugins.hash module

Hash Method

class extraction_methods.plugins.hash.HashExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: hash

Description:: Hashes input string.

Configuration Options: .. list-table:

- ``hash_str``: string to be hashed.
- ``output_key``: key to output to.

Example configuration: .. code-block:: yaml

method: hash

inputs:
hash_str: $model output_key: hashed_terms

input_class: alias of HashInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.hash.HashInput(*, exists_key: str = '$', exists_delimiter: str = '.', hash_str: str, output_key: str)

Bases: Input

Model for Hash Input.

hash_str: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'hash_str': FieldInfo(annotation=str, required=True, description='string to be hashed.'), 'output_key': FieldInfo(annotation=str, required=True, description='key to output to.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str

extraction_methods.plugins.iso19115 module

ISO 19115 Method

class extraction_methods.plugins.iso19115.ISO19115Extract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: iso19115

Description:: Takes a URL and calls out to URL to retrieve the iso19115 record.

Configuration Options: .. list-table:

- ``url``: ``REQUIRED`` URL to record store.
- ``date_terms``: List of name, key, format of date terms to retrieve from the response.

Example configuration: .. code-block:: yaml

method: iso19115 inputs:

url: $url dates:

key: ‘.//gml:beginPosition’ output_key: start_datetime

input_class: alias of ISO19115Input

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.iso19115.ISO19115Input(*, exists_key: str = '$', exists_delimiter: str = '.', url: str, dates: list[KeyOutputKey], request_timeout: int = 15)

Bases: Input

Model for ISO19115 Date Input.

dates: list[KeyOutputKey]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'dates': FieldInfo(annotation=list[KeyOutputKey], required=True, description='list of dates to extract.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'request_timeout': FieldInfo(annotation=int, required=False, default=15, description='request time out.'), 'url': FieldInfo(annotation=str, required=True, description='Url for record store.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

request_timeout: int

url: str

extraction_methods.plugins.iso_date module

ISO Date Method

class extraction_methods.plugins.iso_date.DateTerm(*, input_term: str, format: str = '%Y-%m-%dT%H:%M:%SZ', output_key: str = 'datetime')

Bases: BaseModel

Model for Date terms with format.

format: str

input_term: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'format': FieldInfo(annotation=str, required=False, default='%Y-%m-%dT%H:%M:%SZ', description='Format of the date.'), 'input_term': FieldInfo(annotation=str, required=True, description='Term to run method on.'), 'output_key': FieldInfo(annotation=str, required=False, default='datetime', description='Key to output to.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str

class extraction_methods.plugins.iso_date.ISODateExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: iso_date

Description:

Takes the source dict and the key to access the date and converts the date to ISO 8601 Format.

e.g.

YYYY-MM-DDTHH:MM:SS.ffffff, if microsecond is not 0 YYYY-MM-DDTHH:MM:SS, if microsecond is 0

If the date format cannot be parsed, it is removed from the source dict with an error logged.

Configuration Options: .. list-table:

- ``date_terms``: `REQUIRED` List keys to the date value. Using a list allows processing of multiple dates.
- ``format``: Optional format string. Default behaviour uses `dateutil.parser.parse <https://dateutil.readthedocs.io/en/stable/parser.html#dateutil.parser.parse>`_.
  If a format string is supplied, this will change to use `datetime.datetime.strptime <https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime>`_.

Example Configuration: .. code-block:: yaml

method: iso_date inputs:

dates:

key: $datetime output_key: date format: “%Y-%m-%dT%H:%M:%S”

key: 2012-12-12 format: “%Y-%m-%d”

input_class: alias of ISODateInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.iso_date.ISODateInput(*, exists_key: str = '$', exists_delimiter: str = '.', date_terms: list[DateTerm] = [])

Bases: Input

Model for ISO Date Input.

date_terms: list[DateTerm]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'date_terms': FieldInfo(annotation=list[DateTerm], required=False, default=[], description='List of date terms.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

extraction_methods.plugins.json_file module

JSON File Method

class extraction_methods.plugins.json_file.JsonFileExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: json_file

Description:: Takes an input list of string to extract from the json file.

Configuration Options: .. list-table:

- ``path``: Path to directory or single JSON file.
- ``terms``: List of terms to extract.

Example configuration: .. code-block:: yaml

method: json_file inputs:

path: /path/to/file.json properties:

key: MIP_ERA output_key: mip_era

extract_terms(path: Path) → dict[str, Any]

Extract terms from JSON file(s) at path.

Parameters:: path (Path) – path to file
Returns:: extracted terms
Return type:: dict

find_and_extract() → dict[str, Any]

Find and extract from JSON files.

Returns:: extracted terms
Return type:: dict

input_class: alias of JsonFileInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.json_file.JsonFileInput(*, exists_key: str = '$', exists_delimiter: str = '.', path: str, properties: list[KeyOutputKey])

Bases: Input

Model for JSON File Input.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'path': FieldInfo(annotation=str, required=True, description='Path to directory of JSON files or single JSON file.'), 'properties': FieldInfo(annotation=list[KeyOutputKey], required=True, description='list of properties to extract.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

path: str

properties: list[KeyOutputKey]

extraction_methods.plugins.lambda module

Lambda Method

class extraction_methods.plugins.lambda.LambdaExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: lambda

Description:: Accepts a dictionary. String values are popped from the dictionary and are put back into the dictionary with the key specified.

Configuration Options: .. list-table:

- ``function``: ``REQUIRED`` lambda function to be run.
- ``output_key``: Optional name of the key you would like to output else
                  response will be merged.
- ``args``: Optional list of arguments for function.
            Use $ for previously extracted terms
- ``kwargs``: Optional dictionary of key word arguments for function.
              Use $ for previously extracted terms

Example Configuration: .. code-block:: yaml

method: lambda inputs:

function: ‘lambda x: x * x’ args:

hello

$world

kwargs:
hello: world goodbye: all

input_class: alias of LambdaInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.lambda.LambdaInput(*, exists_key: str = '$', exists_delimiter: str = '.', function: str, args: list[Any] = [], kwargs: dict[str, Any] = {}, output_key: str = 'label')

Bases: Input

Model for Lambda Input.

args: list[Any]

function: str

kwargs: dict[str, Any]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'args': FieldInfo(annotation=list[Any], required=False, default=[], description='list of arguments for function.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'function': FieldInfo(annotation=str, required=True, description='lambda function to be run.'), 'kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='dictionary of key word arguments for function.'), 'output_key': FieldInfo(annotation=str, required=False, default='label', description='key to output to.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str

extraction_methods.plugins.netcdf module

NetCDF Method

class extraction_methods.plugins.netcdf.NetCDFExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: netcdf

Description: Processes XML documents to extract metadata

Configuration Options: .. list-table:

- ``extraction_keys``: List of keys to retrieve from the document.
- ``filter_expr``: Regex to match against files to limit the attempts to known files
- ``namespaces``: Map of namespaces

Extraction Keys:

Extraction keys should be a map.

Name	Description
`output_key`	Name of the outputted attribute
`key`	Access key to extract the required data. Passed to xml.etree.ElementTree.find() and also supports xpath formatted accessors
`attribute`	Allows you to select from the element attribute. In the absence of this value, the default behaviour is to access the text value of the key. In some cases, you might want to access and attribute of the element

Example configuration: .. code-block:: yaml

method: xml inputs:

filter_expr: ‘.manifest$’ extraction_keys:

name: start_datetime key: ‘.//gml:beginPosition’ attribute: start

# noqa: W605

input_class: alias of NetCDFInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.netcdf.NetCDFInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', variable_id: str = '$uri', variable_attributes: list[KeyOutputKey] = [], global_attributes: list[KeyOutputKey] = [], cf_attributes: list[KeyOutputKey] = [], rio_attributes: list[KeyOutputKey] = [])

Bases: Input

Model for NetCDF Input.

cf_attributes: list[KeyOutputKey]

global_attributes: list[KeyOutputKey]

input_term: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'cf_attributes': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of cf attributes to extract.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'global_attributes': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of global attributes to extract.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'rio_attributes': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of rio attributes to extract.'), 'variable_attributes': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of variable attributes to extract.'), 'variable_id': FieldInfo(annotation=str, required=False, default='$uri', description='lambda function to be run.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

rio_attributes: list[KeyOutputKey]

variable_attributes: list[KeyOutputKey]

variable_id: str

extraction_methods.plugins.open_zip module

Open Zip Method

class extraction_methods.plugins.open_zip.ZipExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: open_zip

Description:: Open a zip file and read inner files

Configuration Options: .. list-table:

- ``input_term``: List of keys to retrieve from the document.
- ``inner_files``: Lost of inner zipped files to be read.
- ``output_key``: key to output to.

Example configuration: .. code-block:: yaml

method: open_zip inputs:

input_term: /path/to/a/file inner_files:

key: hello.txt output_key: world

# noqa: W605

input_class: alias of ZipInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.open_zip.ZipInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', inner_files: list[KeyOutputKey] = [], output_key: str = '')

Bases: Input

Model for Zip Input.

check_root_read() → Self

inner_files: list[KeyOutputKey]

input_term: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'inner_files': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of inner zipped files to be read.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'output_key': FieldInfo(annotation=str, required=False, default='', description='key to output to.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str

extraction_methods.plugins.path_parts module

Path Parts Method

class extraction_methods.plugins.path_parts.PathPartsExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: path_parts

Description:: Extracts the parts of a given path skipping skip number of top level parts.

Configuration Options: .. list-table:

- ``skip``: The number of path parts to skip. ``default: 0``

Example configuration: .. code-block:: yaml

method: path_parts inputs:

input_term: $uri skip: 2

input_class: alias of PathPartsInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.path_parts.PathPartsInput(*, exists_key: str = '$', exists_delimiter: str = '.', path: str = '$uri', skip: int = 0)

Bases: Input

Model for Path Parts Input.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'path': FieldInfo(annotation=str, required=False, default='$uri', description='path for method to run on.'), 'skip': FieldInfo(annotation=int, required=False, default=0, description='number of path parts to skip.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

path: str

skip: int

extraction_methods.plugins.regex module

Regex Method

class extraction_methods.plugins.regex.RegexExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: regex

Description:: Takes an input string and a regex with named capture groups and returns a dictionary of the values extracted using the named capture groups.

Configuration Options: .. list-table:

- ``input_term``: Term for regex to be ran on.
- ``regex``: ``REQUIRED`` The regular expression to match against.

Example configuration: .. code-block:: yaml

method: regex inputs:

regex: ^(?:[^_]*_){2}(?P<datetime>d*)

# noqa: W605

input_class: alias of RegexInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.regex.RegexInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', regex: str)

Bases: Input

Model for Regex Input.

input_term: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'regex': FieldInfo(annotation=str, required=True, description='The regular expression to match against.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

regex: str

extraction_methods.plugins.regex_label module

Regex Label Method

class extraction_methods.plugins.regex_label.RegexLabelExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: regex_label

Description:: Adds label if full match of regex.

Configuration Options: .. list-table:

- ``input_term``: term for method to run on.
- ``label``: ``REQUIRED`` Label to add if regex passes.
- ``regex``: ``REQUIRED`` Regex to test against.
- ``allow_multiple``: True if multiple labels are allowed.
- ``output_key``: Term for method to output to.

Example configuration: .. code-block:: yaml

method: regex_label inputs:

label: metadata regex: README allow_multiple: true

# noqa: W605

input_class: alias of RegexLabelInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.regex_label.RegexLabelInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', label: str, regex: str, allow_multiple: bool = True, output_key: str = 'label')

Bases: Input

Model for Regex Label Input.

allow_multiple: bool

input_term: str

label: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'allow_multiple': FieldInfo(annotation=bool, required=False, default=True, description='True if multiple labels are allowed.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'label': FieldInfo(annotation=str, required=True, description='Label to add if regex passes.'), 'output_key': FieldInfo(annotation=str, required=False, default='label', description='Term for method to output to.'), 'regex': FieldInfo(annotation=str, required=True, description='Regex to test against.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str

regex: str

extraction_methods.plugins.regex_rename module

Regex Rename Method

class extraction_methods.plugins.regex_rename.RegexOutputKey(*, exists_key: str = '$', exists_delimiter: str = '.', regex: str, output_key: str)

Bases: Input

Model for Regex.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'output_key': FieldInfo(annotation=str, required=True, description='Term for method to output to.'), 'regex': FieldInfo(annotation=str, required=True, description='Regex to test against.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str

regex: str

class extraction_methods.plugins.regex_rename.RegexRenameExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: regex_rename

Description:: Takes a list of regex and output key combinations. Any existing properties that full match a regex are rename to the output key. Later regex take precedence.

Configuration Options: .. list-table:

- ``regex_swaps``: Regex and output key combinations.

Example configuration: .. code-block:: yaml

method: regex_rename inputs:

regex_swaps:

regex: README output_key: metadata

# noqa: W605

add(body: dict[str, Any], key_parts: list[str], value: Any) → dict[str, Any]

Rename terms

Parameters:

body (dict) – current body
key_parts (list) – key parts seperated by delimiter

Returns:

dict

Return type:

update body

find(body: dict[str, Any], key_parts: list[str]) → tuple[dict[str, Any], Any]

Rename terms

Parameters:

body (dict) – current body
key_parts (list) – key parts seperated by delimiter

Returns:

dict

Return type:

update body

input_class: alias of RegexRenameInput

matching_keys(keys: KeysView[str], key_regex: str) → list[str]

Find all keys that match regex

Parameters:

keys (KeysView) – dictionary keys to test
key_regex (str) – regex to test against

Returns:

matching keys

Return type:

list

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.regex_rename.RegexRenameInput(*, exists_key: str = '$', exists_delimiter: str = '.', regex_swaps: list[RegexOutputKey], delimiter: str = '')

Bases: Input

Model for Regex Rename Input.

delimiter: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'delimiter': FieldInfo(annotation=str, required=False, default='', description='delimiter for nested term.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'regex_swaps': FieldInfo(annotation=list[RegexOutputKey], required=True, description='Regex and output key combinations.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

regex_swaps: list[RegexOutputKey]

extraction_methods.plugins.regex_type_cast module

Regex Type Cast Method

class extraction_methods.plugins.regex_type_cast.RegexCastType(*, exists_key: str = '$', exists_delimiter: str = '.', regex: str, cast_type: str)

Bases: Input

Model for Regex Cast Type.

cast_type: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'cast_type': FieldInfo(annotation=str, required=True, description='Python type to cast to.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'regex': FieldInfo(annotation=str, required=True, description='Regex to test against.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

regex: str

class extraction_methods.plugins.regex_type_cast.RegexTypeCastExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: regex_type_cast

Description:: Takes a list of regex and cast type combinations. Any existing properties that full match a regex are cast to the associated type.

Configuration Options: .. list-table:

- ``regex_casts``: Regex and cast type combinations.

Example configuration: .. code-block:: yaml

method: regex_type_cast inputs:

regex_casts:

regex: clound_cover cast_type: int

# noqa: W605

input_class: alias of RegexTypeCastInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.regex_type_cast.RegexTypeCastInput(*, exists_key: str = '$', exists_delimiter: str = '.', regex_casts: list[RegexCastType])

Bases: Input

Model for Regex Cast Type Input.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'regex_casts': FieldInfo(annotation=list[RegexCastType], required=True, description='Regex and cast type combinations.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

regex_casts: list[RegexCastType]

extraction_methods.plugins.remove module

Remove Method

class extraction_methods.plugins.remove.RemoveExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: remove

Description:: remove keys from body.

Configuration Options: .. list-table:

- ``keys``: ``REQUIRED`` list of keys to remove.
- ``delimiter``: delimiter for nested key.

Example Configuration: .. code-block:: yaml

method: remove inputs:

keys: - hello - world

input_class: alias of RemoveInput

matching_keys(keys: KeysView[str], key_regex: str) → list[str]

Find all keys that match regex

Parameters:

keys (KeysView) – dictionary keys to test
key_regex (str) – regex to test against

Returns:

matching keys

Return type:

list

remove_key(body: dict[str, Any], key_parts: list[str]) → dict[str, Any]

Remove nested terms

Parameters:

body (dict) – current body
key_parts (list) – key parts seperated by delimiter

Returns:

dict

Return type:

update body

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.remove.RemoveInput(*, exists_key: str = '$', exists_delimiter: str = '.', keys: list[str], delimiter: str = '.')

Bases: Input

Model for Remove Input.

delimiter: str

keys: list[str]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'delimiter': FieldInfo(annotation=str, required=False, default='.', description='delimiter for nested term.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'keys': FieldInfo(annotation=list[str], required=True, description='list of keys to remove.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

extraction_methods.plugins.stac_extension module

STAC Extension Method

class extraction_methods.plugins.stac_extension.STACExtension(*, url: str, prefix: str, properties: list[str])

Bases: BaseModel

Model for STAC Extension.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'prefix': FieldInfo(annotation=str, required=True, description='Extension prefix.'), 'properties': FieldInfo(annotation=list[str], required=True, description='Extension properties.'), 'url': FieldInfo(annotation=str, required=True, description='Extension URL.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

prefix: str

properties: list[str]

url: str

class extraction_methods.plugins.stac_extension.STACExtensionExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: stac_extension

Description:: Accepts a list of extensions which contain url, prefix and list of properties.

Configuration Options: .. list-table:

- ``extensions``: ``REQUIRED`` List of extensions.

Example Configuration: .. code-block:: yaml

method: stac_extension inputs:

extensions:

url: hello.com/v1.0.0/world.json prefix: hello properties:

foo

bar

input_class: alias of STACExtensionInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.stac_extension.STACExtensionInput(*, exists_key: str = '$', exists_delimiter: str = '.', extensions: list[STACExtension])

Bases: Input

Model for STAC Extension Input.

extensions: list[STACExtension]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'extensions': FieldInfo(annotation=list[STACExtension], required=True, description='List of extensions.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

extraction_methods.plugins.string_template module

String Template Method

class extraction_methods.plugins.string_template.StringTemplateExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: string_template

Description:: Accepts a template and output_key. terms are added to the template.

Configuration Options: .. list-table:

- ``template``: ``REQUIRED`` Template to follow.
- ``descructive``: True if terms should be removed after templating.
- ``output_key``: ``REQUIRED`` key to output to.

Example Configuration: .. code-block:: yaml

method: string_template inputs:

template: {hello}/{goodbye}/{hello}/bonjour.html output_key: manifest_url

input_class: alias of StringTemplateInput

run(body: dict[str, Any]) → Any

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.string_template.StringTemplateInput(*, exists_key: str = '$', exists_delimiter: str = '.', template: str, descructive: bool = False, output_key: str)

Bases: Input

Model for String Template Input.

descructive: bool

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'descructive': FieldInfo(annotation=bool, required=False, default=False, description='True if terms should be removed after templating.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'output_key': FieldInfo(annotation=str, required=True, description='key to output to.'), 'template': FieldInfo(annotation=str, required=True, description='Template to follow.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str

template: str

extraction_methods.plugins.xml module

XML Method

class extraction_methods.plugins.xml.XMLExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: xml

Description:: Processes XML documents to extract metadata

Configuration Options: .. list-table:

- ``input_term``: Term for method to run on.
- ``template``: ``REQUIRED`` Template to follow.
- ``properties``: ``REQUIRED`` List of properties to retrieve from the document.
- ``namespaces``: ``REQUIRED`` Map of namespaces.

Extraction Keys:

Extraction keys should be a map.

Name	Description
`key`	Key of the property. Passed to xml.etree.ElementTree.find() and also supports xpath formatted accessors
`output_key`	Key to output to.
`attribute`	Allows you to select from the element attribute. In the absence of this value, the default behaviour is to access the text value of the key. In some cases, you might want to access and attribute of the element.

Example configuration: .. code-block:: yaml

method: xml inputs:

properties:

name: start_datetime key: ‘.//gml:beginPosition’ attribute: start

# noqa: W605

input_class: alias of XMLInput

run(body: dict[str, Any]) → dict[str, Any]

Run the method.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.xml.XMLInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', properties: list[XMLProperty], namespaces: dict[str, str])

Bases: Input

Model for XML Input.

input_term: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='Term for method to run on.'), 'namespaces': FieldInfo(annotation=dict[str, str], required=True, description='Map of namespaces.'), 'properties': FieldInfo(annotation=list[XMLProperty], required=True, description='List of properties to retrieve from the document.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

namespaces: dict[str, str]

properties: list[XMLProperty]

class extraction_methods.plugins.xml.XMLProperty(*, key: str, output_key: str = '', attribute: str = '')

Bases: KeyOutputKey

Model for XML property.

attribute: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'attribute': FieldInfo(annotation=str, required=False, default='', description='Attribute of the XML property.'), 'key': FieldInfo(annotation=str, required=True), 'output_key': FieldInfo(annotation=str, required=False, default='')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

extraction_methods.plugins package

Subpackages

Submodules

extraction_methods.plugins.bbox module

Bounding Box Method

extraction_methods.plugins.ceda_observation module

CEDA Observation Method

extraction_methods.plugins.ceda_vocabulary module

CEDA Vocabulary Method

extraction_methods.plugins.controlled_vocabulary module

Controlled Vocabulary Method

extraction_methods.plugins.datetime_bound_to_centroid module

Datetime Bound to Centroid Method

extraction_methods.plugins.default module

Default Method

extraction_methods.plugins.dict_aggregator module

Dictionary Aggregator Method

extraction_methods.plugins.elasticsearch_aggregation module

Elasticsearch Aggregation Method

extraction_methods.plugins.facet_map module

Facet Map Method

extraction_methods.plugins.facet_prefix module

Facet Prefix Method

extraction_methods.plugins.general_function module

General Function Method

extraction_methods.plugins.geometry module

Geometry Method

extraction_methods.plugins.geometry_to_bbox module

Geometry to Bounding Box Method

extraction_methods.plugins.hash module

Hash Method

extraction_methods.plugins.iso19115 module

ISO 19115 Method

extraction_methods.plugins.iso_date module

ISO Date Method

extraction_methods.plugins.json_file module

JSON File Method

extraction_methods.plugins.lambda module

Lambda Method

extraction_methods.plugins.netcdf module

NetCDF Method

extraction_methods.plugins.open_zip module

Open Zip Method

extraction_methods.plugins.path_parts module

Path Parts Method

extraction_methods.plugins.regex module

Regex Method

extraction_methods.plugins.regex_label module

Regex Label Method

extraction_methods.plugins.regex_rename module

Regex Rename Method

extraction_methods.plugins.regex_type_cast module

Regex Type Cast Method

extraction_methods.plugins.remove module

Remove Method

extraction_methods.plugins.stac_extension module

STAC Extension Method

extraction_methods.plugins.string_template module

String Template Method

extraction_methods.plugins.xml module

XML Method

Module contents