extraction_methods.plugins package

Subpackages

Submodules

extraction_methods.plugins.bbox module

Bounding Box Method

class extraction_methods.plugins.bbox.BboxExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: bbox

Description:

Converts a coordinate values to RFC 7946, section 5 formatted bbox.

Configuration Options: .. list-table:

- ``west``: ``REQUIRED`` Most westerly coordinate
- ``south``: ``REQUIRED`` Most southernly coordinate
- ``east``: ``REQUIRED`` Most easterly coordinate
- ``north``: ``REQUIRED`` Most northernly coordinate

Example Configuration: .. code-block:: yaml

  • method: bbox inputs:

    west: 0 south: 0 east: $east_variable north: $north_variable

input_class

alias of BboxInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.bbox.BboxInput(*, exists_key: str = '$', exists_delimiter: str = '.', west: float | str, south: float | str, east: float | str, north: float | str)

Bases: Input

Model for BBox Method Input.

east: float | str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'east': FieldInfo(annotation=Union[float, str], required=True, description='east coordinate.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'north': FieldInfo(annotation=Union[float, str], required=True, description='north coordinate.'), 'south': FieldInfo(annotation=Union[float, str], required=True, description='south coordinate.'), 'west': FieldInfo(annotation=Union[float, str], required=True, description='west coordinate.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

north: float | str
south: float | str
west: float | str

extraction_methods.plugins.ceda_observation module

CEDA Observation Method

class extraction_methods.plugins.ceda_observation.CEDAObservationExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: ceda_observation

Description:

Returns a ceda observation record for the input_term.

Configuration Options: .. list-table:

- ``input_term``: ``REQUIRED`` term for method to run on

Example Configuration: .. code-block:: yaml

  • method: ceda_observation inputs:

    input_term: $url

input_class

alias of CEDAObservationInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.ceda_observation.CEDAObservationInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', request_timeout: int = 15, output_key: str = 'uuid')

Bases: Input

Model for CEDA Observation Method Input.

input_term: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'output_key': FieldInfo(annotation=str, required=False, default='uuid', description='key to output to.'), 'request_timeout': FieldInfo(annotation=int, required=False, default=15, description='request time out.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str
request_timeout: int

extraction_methods.plugins.ceda_vocabulary module

CEDA Vocabulary Method

class extraction_methods.plugins.ceda_vocabulary.CEDAVocabularyExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: ceda_vocabulary

Description:

Validates and sorts properties into vocabs and generates the general vocab for specified properties.

Configuration Options: .. list-table:

- ``url``: ``REQUIRED`` url of vocabulary server
- ``namespace``: ``REQUIRED`` namespace of vocab for terms
- ``terms``: Terms to be validated
- ``strict``: Boolean on whether values should be validated
- ``request_timeout``: request time out

Example configuration: .. code-block:: yaml

  • method: ceda_vocabulary inputs:

    url: vocab.ceda.ac.uk namespace: cmip6 strict: False terms:

    • start_time

    • model

input_class

alias of CEDAVocabularyInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.ceda_vocabulary.CEDAVocabularyInput(*, exists_key: str = '$', exists_delimiter: str = '.', url: str, namespace: str, strict: bool = False, terms: list[str] = [], request_timeout: int = 15)

Bases: Input

Model for CEDA Vocab Method Input.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'namespace': FieldInfo(annotation=str, required=True, description='Namespace for vocab terms.'), 'request_timeout': FieldInfo(annotation=int, required=False, default=15, description='request time out.'), 'strict': FieldInfo(annotation=bool, required=False, default=False, description='True if values should be validated.'), 'terms': FieldInfo(annotation=list[str], required=False, default=[], description='terms to be validated.'), 'url': FieldInfo(annotation=str, required=True, description='URL of vocabulary server.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

namespace: str
request_timeout: int
strict: bool
terms: list[str]
url: str

extraction_methods.plugins.controlled_vocabulary module

extraction_methods.plugins.datetime_bound_to_centroid module

Datetime Bound to Centroid Method

class extraction_methods.plugins.datetime_bound_to_centroid.DatetimeBoundToCentroidExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: datetime_bound_to_centroid

Description:

Accepts a dictionary of coordinate values and converts to RFC 7946, section 5 formatted bbox.

Configuration Options: .. list-table:

- ``start_datetime``: Start datetime bound
- ``start_format``: Format of the start datetime
- ``end_datetime``: End datetime bound
- ``end_format``: Format of the end datetime
- ``output_key``: Term for method to output to
- ``output_format``: Format of the output datetime

Example Configuration: .. code-block:: yaml

  • method: datetime_bound_to_centroid inputs:

    start_datetime: $start_date end_datetime: 2022-02-02 end_format: %Y-%m-%d output_key: polygon

input_class

alias of DatetimeBoundToCentroidInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

strip_time(datetime_str: str, datetime_format: str) datetime

strip datetime from value.

Parameters:
  • datetime_str (str) – string to convert to datetime

  • datetime_format (str) – format of datetime string

Returns:

datetime object

Return type:

datetime

class extraction_methods.plugins.datetime_bound_to_centroid.DatetimeBoundToCentroidInput(*, exists_key: str = '$', exists_delimiter: str = '.', start_datetime: str = '$start_datetime', start_format: str = '%Y-%m-%dT%H:%M:%S', end_datetime: str = '$end_datetime', end_format: str = '%Y-%m-%dT%H:%M:%S', output_key: str = 'datetime', output_format: str = '%Y-%m-%dT%H:%M:%SZ')

Bases: Input

Model for Datetime Bound to Centroid Method Input.

end_datetime: str
end_format: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'end_datetime': FieldInfo(annotation=str, required=False, default='$end_datetime', description='End datetime bound.'), 'end_format': FieldInfo(annotation=str, required=False, default='%Y-%m-%dT%H:%M:%S', description='Format of end datetime.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'output_format': FieldInfo(annotation=str, required=False, default='%Y-%m-%dT%H:%M:%SZ', description='format of output.'), 'output_key': FieldInfo(annotation=str, required=False, default='datetime', description='key to output to.'), 'start_datetime': FieldInfo(annotation=str, required=False, default='$start_datetime', description='Start datetime bound.'), 'start_format': FieldInfo(annotation=str, required=False, default='%Y-%m-%dT%H:%M:%S', description='Format for start datetime.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_format: str
output_key: str
start_datetime: str
start_format: str

extraction_methods.plugins.default module

Default Method

class extraction_methods.plugins.default.DefaultExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: default

Description:

Takes a set of default facets.

Configuration Options: .. list-table:

- ``defaults``: Dictionary of defaults to be added

Example configuration: .. code-block:: yaml

  • method: default inputs:

    defaults:

    mip_era: CMIP6

input_class

alias of DefaultInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.default.DefaultInput(*, exists_key: str = '$', exists_delimiter: str = '.', defaults: dict[str, Any])

Bases: Input

Model for Default Method Input.

defaults: dict[str, Any]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'defaults': FieldInfo(annotation=dict[str, Any], required=True, description='Defaults to be added.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

extraction_methods.plugins.dict_aggregator module

Dictionary Aggregator Method

class extraction_methods.plugins.dict_aggregator.DictAggregatorExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: dict_aggregator

Description:

Aggregate information within dictionary.

Configuration Options: .. list-table:

- ``min``: list of terms for which the minimum of their aggregate should be returned
- ``max``: list of terms for which the maximum of their aggregate should be returned
- ``sum``: list of terms for which the sum of their aggregate should be returned
- ``list``: list of terms for which a list of their aggregage should be returned
- ``mean``: list of terms for which a list of their aggregage should be returned

Configuration Example: .. code-block:: yaml

  • method: dict_aggregator inputs:

    min:
    • start_time

    max:
    • end_time

    sum:
    • size

    list:
    • term1

    • term2

input_class

alias of DictAggregatorInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.dict_aggregator.DictAggregatorInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str | dict[str, Any] = '$assets', min: list[KeyOutputKey] = [], max: list[KeyOutputKey] = [], sum: list[KeyOutputKey] = [], mean: list[KeyOutputKey] = [], bucket: list[KeyOutputKey] = [])

Bases: Input

Model for Dictionary Aggregator Method Input.

bucket: list[KeyOutputKey]
input_term: str | dict[str, Any]
max: list[KeyOutputKey]
mean: list[KeyOutputKey]
min: list[KeyOutputKey]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'bucket': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the list of their aggregate should be returned.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=Union[str, dict[str, Any]], required=False, default='$assets', description='term for method to run on.'), 'max': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the maximum of their aggregate should be returned.'), 'mean': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the mean of their summed aggregate should be returned.'), 'min': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the minimum of their aggregate should be returned.'), 'sum': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the sum of their aggregate should be returned.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

sum: list[KeyOutputKey]

extraction_methods.plugins.elasticsearch_aggregation module

Elasticsearch Aggregation Method

class extraction_methods.plugins.elasticsearch_aggregation.ElasticsearchAggregationExtract(**kwargs: Any)

Bases: ExtractionMethod

Method: elasticsearch_aggregation

Description:

Using an ID. Generate a summary of information for higher level entities.

Configuration Options: .. list-table:

- ``index``: Name of the index holding the STAC entities
- ``id_term``: Term used for agregating the STAC entities
- ``client_kwargs``: Session parameters passed to
`elasticsearch.Elasticsearch<https://elasticsearch-py.readthedocs.io/en/7.10.0/api.html>`_
- ``bbox``: list of terms for which their aggregate bbox should be returned
- ``min``: list of terms for which the minimum of their aggregate should be returned
- ``max``: list of terms for which the maximum of their aggregate should be returned
- ``sum``: list of terms for which the sum of their aggregate should be returned
- ``list``: list of terms for which a list of their aggregage should be returned

Configuration Example: .. code-block:: yaml

  • method: elasticsearch_aggregation inputs:

    index: ceda-index id_term: item_id client_kwargs:

    hosts: [‘host1:9200’,’host2:9200’]

    bbox:
    • bbox

    min:
    • start_time

    max:
    • end_time

    sum:
    • size

    list:
    • term1

    • term2

base_query() dict[str, Any]

Base query to filter the results to a single collection.

Returns:

base query

Return type:

dict

static basic_aggregation(agg_type: str, facet: KeyOutputKey) dict[str, Any]

Query to retrieve the minimum value from docs.

Parameters:
  • agg_type (str) – type of aggregation

  • facet (KeyOutputKey) – facet to aggregate

Returns:

basic aggregation query

Return type:

dict

construct_query() dict[str, Any]

Function to create the initial elasticsearch query.

Returns:

aggregation query

Return type:

dict

extract_facet(aggregations: dict[str, Any], facet: KeyOutputKey) Any

Function to extract the given facets from the aggregation.

Parameters:
  • input_dict (dict) – aggregations

  • facet – facet to be extracted

Returns:

extracted facet

Return type:

Any

extract_facet_lists(query: dict[str, Any], aggregations: dict[str, Any], facets: list[KeyOutputKey]) dict[str, Any]

Function to extract the lists of given facets from the aggregation.

Parameters:
  • query (dict) – attribute dictionary to update

  • aggregations (dict) – current generated properties

  • facets (list) – facets to be extracted

Returns:

extracted list facets

Return type:

dict

extract_first_facet(properties: dict[str, Any], facet: KeyOutputKey) Any

Function to extract the given default facets from the first hit.

Parameters:
  • properties (dict) – properties from first record

  • facet – current facet to be extracted

Returns:

extracted facet

Return type:

Any

extract_metadata(query: dict[str, Any], result: dict[str, Any]) dict[str, Any]

Function to extract the required metadata from the returned query result.

Parameters:
  • query (dict) – previous query

  • result (dict) – resutls from previous query

Returns:

metadata

Return type:

dict

static facet_composite_aggregation(facet: KeyOutputKey) dict[str, Any]

Generate the composite aggregation for the facet.

Parameters:

facet (KeyOutputKey) – facet to aggregate

Returns:

composite aggregation query

Return type:

dict

run(body: dict[str, Any]) dict[str, Any]

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.elasticsearch_aggregation.ElasticsearchAggregationInput(*, exists_key: str = '$', exists_delimiter: str = '.', index: str, id_term: str, client_kwargs: dict[str, Any] = {}, search_query: dict[str, Any] = {'bool': {'must': [{'term': {'path': {'value': '$uri'}}}], 'must_not': [{'term': {'categories.keyword': {'value': 'hidden'}}}]}}, geo_bound: list[KeyOutputKey] = [], first: list[KeyOutputKey] = [], min: list[KeyOutputKey] = [], max: list[KeyOutputKey] = [], sum: list[KeyOutputKey] = [], mean: list[KeyOutputKey] = [], bucket: list[KeyOutputKey] = [], request_tiemout: int = 15, allow_multiple: bool = True, output_key: str = 'label')

Bases: Input

Model for Elasticsearch Aggregation Input.

allow_multiple: bool
bucket: list[KeyOutputKey]
client_kwargs: dict[str, Any]
first: list[KeyOutputKey]
geo_bound: list[KeyOutputKey]
id_term: str
index: str
max: list[KeyOutputKey]
mean: list[KeyOutputKey]
min: list[KeyOutputKey]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'allow_multiple': FieldInfo(annotation=bool, required=False, default=True, description='True if multiple labels are allowed.'), 'bucket': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the list of their aggregate should be returned.'), 'client_kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='Parameters passed to elasticsearch client.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'first': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description="list of terms for which the first record's value should be returned."), 'geo_bound': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the minimum of their aggregate should be returned.'), 'id_term': FieldInfo(annotation=str, required=True, description='Term used for agregating the STAC entities.'), 'index': FieldInfo(annotation=str, required=True, description='Name of the index holding the STAC entities.'), 'max': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the maximum of their aggregate should be returned.'), 'mean': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the mean of their summed aggregate should be returned.'), 'min': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the minimum of their aggregate should be returned.'), 'output_key': FieldInfo(annotation=str, required=False, default='label', description='key to output to.'), 'request_tiemout': FieldInfo(annotation=int, required=False, default=15, description='Time out for search.'), 'search_query': FieldInfo(annotation=dict[str, Any], required=False, default={'bool': {'must_not': [{'term': {'categories.keyword': {'value': 'hidden'}}}], 'must': [{'term': {'path': {'value': '$uri'}}}]}}, description='Session parameters passed to elasticsearch client.'), 'sum': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the sum of their aggregate should be returned.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str
request_tiemout: int
search_query: dict[str, Any]
sum: list[KeyOutputKey]

extraction_methods.plugins.facet_map module

Facet Map Method

class extraction_methods.plugins.facet_map.FacetMapExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: facet_map

Description:

In some cases, you may wish to map the header attributes to different facets. This method takes a map and converts the facet labels into those specified.

Configuration Options: .. list-table:

- ``term_map``: Dictionary of terms to map.

Example Configuration: .. code-block:: yaml

  • method: facet_map inputs:

    term_map:

    old_key: new_key time_coverage_start: start_time

input_class

alias of FacetMapInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.facet_map.FacetMapInput(*, exists_key: str = '$', exists_delimiter: str = '.', term_map: dict[str, str] = {})

Bases: Input

Model for Facet Map Input.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'term_map': FieldInfo(annotation=dict[str, str], required=False, default={}, description='Dictionary of terms to be mapped.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

term_map: dict[str, str]

extraction_methods.plugins.facet_prefix module

Facet Prefix Method

class extraction_methods.plugins.facet_prefix.FacetPrefixExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: facet_prefix

Description:

In some cases, you may wish add a prefix to some or all of the facets based on the vocabulary they’re from.

Configuration Options: .. list-table:

- ``prefix``: Prefix to be added.
- ``keys``: List of keys that require prefix.

Example Configuration: .. code-block:: yaml

  • method: facet_prefix inputs:

    prefix: cmip6 keys:

    • start_time

    • model

input_class

alias of FacetPrefixInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.facet_prefix.FacetPrefixInput(*, exists_key: str = '$', exists_delimiter: str = '.', prefix: str, keys: list[str])

Bases: Input

Model for Facet Prefix Input.

keys: list[str]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'keys': FieldInfo(annotation=list[str], required=True, description='list of keys that require prefix.'), 'prefix': FieldInfo(annotation=str, required=True, description='Prefix to be added.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

prefix: str

extraction_methods.plugins.general_function module

General Function Method

class extraction_methods.plugins.general_function.Function(*, name: str, args: list[Any] = [], kwargs: dict[str, Any] = {})

Bases: BaseModel

Model for Fuction.

args: list[Any]
kwargs: dict[str, Any]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'args': FieldInfo(annotation=list[Any], required=False, default=[], description='list of arguments for function.'), 'kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='dictionary of key word arguments for function.'), 'name': FieldInfo(annotation=str, required=True, description='Name of function.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

name: str
class extraction_methods.plugins.general_function.GeneralFunctionExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: general_function

Description:

Accepts a dictionary. String values are popped from the dictionary and are put back into the dictionary with the key specified.

Configuration Options: .. list-table:

- ``function``: ``REQUIRED`` Function to be run ``name``, ``args``, and ``kwargs``.
- ``delimiter``: Optional text delimiter to put between module/function
                names ``Default`` "."
- ``output_key``: Optional name of the key you would like to output else
                  response will be merged.

Example Configuration: .. code-block:: yaml

  • method: general_function inputs:

    funtion:

    name: import.path.to.the.fuction args:

    • hello

    • world

    kwargs:

    hello: world foo: bar

input_class

alias of GeneralFunctionInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.general_function.GeneralFunctionInput(*, exists_key: str = '$', exists_delimiter: str = '.', function: Function, delimiter: str = '.', output_key: str = '')

Bases: Input

Model for General Fuction Input.

delimiter: str
function: Function
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'delimiter': FieldInfo(annotation=str, required=False, default='.', description='text delimiter to put between module/function names.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'function': FieldInfo(annotation=Function, required=True, description='Function to be run name maybe seperatated my delimieter.'), 'output_key': FieldInfo(annotation=str, required=False, default='', description='key to output to, else response will be merged with body.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str

extraction_methods.plugins.geometry module

Geometry Method

class extraction_methods.plugins.geometry.GeometryExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: geometry

Description:

Accepts a dictionary of coordinate values and converts to RFC 7946, formatted geometry.

Configuration Options: .. list-table:

- ``type``: ``REQUIRED`` Type of geometry to be produced.
- ``coordinates``: ``REQUIRED`` list of coordinates to convert to geometry. Ordering is respected.
- ``output_key``: key to output to.

Example Configuration: .. code-block:: yaml

  • name: geometry inputs:

    type: line coordinates:

      • 0

      • 0

      • $lon_2

      • $lat_2

get_coordinates(coordinate_type: str, coordinates: list[Any]) list[Any]

Get coordinates

Parameters:
  • coordinate_type (str) – type of coordinates

  • coordinates (list) – list of coordinates

Returns:

coordinates

Return type:

list

input_class

alias of GeometryInput

line(coordinates: list[list[str | float]]) list[list[float]]

Get line coordinates

Parameters:

coordinates (list) – list of coordinates

Returns:

coordinates

Return type:

list

multi(coordinate_type: str, coordinates: list[Any]) list[Any]

Get polygon coordinates

Parameters:
  • coordinate_type (str) – type of coordinates

  • coordinates (list) – list of coordinates

Returns:

coordinates

Return type:

list

point(coordinates: list[str | float]) list[float]

Get point coordinates

Parameters:

coordinates (list) – list of coordinates

Returns:

coordinates

Return type:

list

polygon(coordinates: list[list[str | float]]) list[list[list[float]]]

Get polygon coordinates

Parameters:

coordinates (list) – list of coordinates

Returns:

coordinates

Return type:

list

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.geometry.GeometryInput(*, exists_key: str = '$', exists_delimiter: str = '.', type: Literal['Point', 'LineString', 'Polygon', 'MultiPointString', 'MultiLineString', 'MultiPolygon'], coordinates: list[Any], output_key: str = 'geometry')

Bases: Input

Model for Geometry Input.

coordinates: list[Any]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'coordinates': FieldInfo(annotation=list[Any], required=True, description='list of coordinates to convert to geometry. Ordering is respected.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'output_key': FieldInfo(annotation=str, required=False, default='geometry', description='key to output to.'), 'type': FieldInfo(annotation=Literal[str, str, str, str, str, str], required=True, description='Type of geometry to be produced.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str
type: Literal['Point', 'LineString', 'Polygon', 'MultiPointString', 'MultiLineString', 'MultiPolygon']

extraction_methods.plugins.geometry_to_bbox module

Geometry to Bounding Box Method

class extraction_methods.plugins.geometry_to_bbox.GeometryToBboxExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: geometry_to_bbox

Description:

Accepts a geometry with type and list of coordinates to RFC 7946, section 5 formatted bbox.

Configuration Options: .. list-table:

- ``geometry``: ``REQUIRED`` geometry to be converted to bbox.
- ''output_key'': key to output to.

Example Configuration: .. code-block:: yaml

  • method: geometry_to_bbox inputs:

    geometry:

    type: point coordinates:

    • 20

    • 0

get_bbox(coordinate_type: str, coordinates: list[Any]) list[float]

Get bbox from geometry

Parameters:
  • coordinate_type (str) – type of coordinates

  • coordinates (list) – list of coordinates

Returns:

bounding box of coordinates

Return type:

list

input_class

alias of GeometryToBboxInput

line(coordinates: list[list[float]]) list[float]

Get line bbox

Parameters:

coordinates (list) – list of coordinates

Returns:

bounding box of coordinates

Return type:

list

multi(coordinate_type: str, coordinates: list[Any]) list[float]

Get polygon bbox

Parameters:
  • coordinate_type (str) – type of coordinates

  • coordinates (list) – list of coordinates

Returns:

bounding box of coordinates

Return type:

list

point(coordinates: list[float]) list[float]

Get point bbox

Parameters:

coordinates (list) – list of coordinates

Returns:

bounding box of coordinates

Return type:

list

polygon(coordinates: list[list[float]]) list[float]

Get polygon bbox

Parameters:

coordinates (list) – list of coordinates

Returns:

bounding box of coordinates

Return type:

list

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.geometry_to_bbox.GeometryToBboxInput(*, exists_key: str = '$', exists_delimiter: str = '.', geometry: dict[str, Any] = '$geometry', output_key: str = 'bbox')

Bases: Input

Model for Geometry to Bounding Box Input.

geometry: dict[str, Any]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'geometry': FieldInfo(annotation=dict[str, Any], required=False, default='$geometry', description='geometry to be converted to bbox.'), 'output_key': FieldInfo(annotation=str, required=False, default='bbox', description='key to output to.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str

extraction_methods.plugins.hash module

Hash Method

class extraction_methods.plugins.hash.HashExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: hash

Description:

Hashes input string.

Configuration Options: .. list-table:

- ``hash_str``: string to be hashed.
- ``output_key``: key to output to.

Example configuration: .. code-block:: yaml

method: hash
inputs:

hash_str: $model output_key: hashed_terms

input_class

alias of HashInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.hash.HashInput(*, exists_key: str = '$', exists_delimiter: str = '.', hash_str: str, output_key: str)

Bases: Input

Model for Hash Input.

hash_str: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'hash_str': FieldInfo(annotation=str, required=True, description='string to be hashed.'), 'output_key': FieldInfo(annotation=str, required=True, description='key to output to.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str

extraction_methods.plugins.iso19115 module

ISO 19115 Method

class extraction_methods.plugins.iso19115.ISO19115Extract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: iso19115

Description:

Takes a URL and calls out to URL to retrieve the iso19115 record.

Configuration Options: .. list-table:

- ``url``: ``REQUIRED`` URL to record store.
- ``date_terms``: List of name, key, format of date terms to retrieve from the response.

Example configuration: .. code-block:: yaml

  • method: iso19115 inputs:

    url: $url dates:

    • key: ‘.//gml:beginPosition’ output_key: start_datetime

input_class

alias of ISO19115Input

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.iso19115.ISO19115Input(*, exists_key: str = '$', exists_delimiter: str = '.', url: str, dates: list[KeyOutputKey], request_timeout: int = 15)

Bases: Input

Model for ISO19115 Date Input.

dates: list[KeyOutputKey]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'dates': FieldInfo(annotation=list[KeyOutputKey], required=True, description='list of dates to extract.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'request_timeout': FieldInfo(annotation=int, required=False, default=15, description='request time out.'), 'url': FieldInfo(annotation=str, required=True, description='Url for record store.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

request_timeout: int
url: str

extraction_methods.plugins.iso_date module

ISO Date Method

class extraction_methods.plugins.iso_date.DateTerm(*, input_term: str, format: str = '%Y-%m-%dT%H:%M:%SZ', output_key: str = 'datetime')

Bases: BaseModel

Model for Date terms with format.

format: str
input_term: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'format': FieldInfo(annotation=str, required=False, default='%Y-%m-%dT%H:%M:%SZ', description='Format of the date.'), 'input_term': FieldInfo(annotation=str, required=True, description='Term to run method on.'), 'output_key': FieldInfo(annotation=str, required=False, default='datetime', description='Key to output to.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str
class extraction_methods.plugins.iso_date.ISODateExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: iso_date

Description:

Takes the source dict and the key to access the date and converts the date to ISO 8601 Format.

e.g.

YYYY-MM-DDTHH:MM:SS.ffffff, if microsecond is not 0 YYYY-MM-DDTHH:MM:SS, if microsecond is 0

If the date format cannot be parsed, it is removed from the source dict with an error logged.

Configuration Options: .. list-table:

- ``date_terms``: `REQUIRED` List keys to the date value. Using a list allows processing of multiple dates.
- ``format``: Optional format string. Default behaviour uses `dateutil.parser.parse <https://dateutil.readthedocs.io/en/stable/parser.html#dateutil.parser.parse>`_.
  If a format string is supplied, this will change to use `datetime.datetime.strptime <https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime>`_.

Example Configuration: .. code-block:: yaml

  • method: iso_date inputs:

    dates:
    • key: $datetime output_key: date format: “%Y-%m-%dT%H:%M:%S”

    • key: 2012-12-12 format: “%Y-%m-%d”

input_class

alias of ISODateInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.iso_date.ISODateInput(*, exists_key: str = '$', exists_delimiter: str = '.', date_terms: list[DateTerm] = [])

Bases: Input

Model for ISO Date Input.

date_terms: list[DateTerm]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'date_terms': FieldInfo(annotation=list[DateTerm], required=False, default=[], description='List of date terms.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

extraction_methods.plugins.json_file module

JSON File Method

class extraction_methods.plugins.json_file.JsonFileExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: json_file

Description:

Takes an input list of string to extract from the json file.

Configuration Options: .. list-table:

- ``path``: Path to directory or single JSON file.
- ``terms``: List of terms to extract.

Example configuration: .. code-block:: yaml

  • method: json_file inputs:

    path: /path/to/file.json properties:

    • key: MIP_ERA output_key: mip_era

extract_terms(path: Path) dict[str, Any]

Extract terms from JSON file(s) at path.

Parameters:

path (Path) – path to file

Returns:

extracted terms

Return type:

dict

find_and_extract() dict[str, Any]

Find and extract from JSON files.

Returns:

extracted terms

Return type:

dict

input_class

alias of JsonFileInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.json_file.JsonFileInput(*, exists_key: str = '$', exists_delimiter: str = '.', path: str, properties: list[KeyOutputKey])

Bases: Input

Model for JSON File Input.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'path': FieldInfo(annotation=str, required=True, description='Path to directory of JSON files or single JSON file.'), 'properties': FieldInfo(annotation=list[KeyOutputKey], required=True, description='list of properties to extract.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

path: str
properties: list[KeyOutputKey]

extraction_methods.plugins.lambda module

Lambda Method

class extraction_methods.plugins.lambda.LambdaExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: lambda

Description:

Accepts a dictionary. String values are popped from the dictionary and are put back into the dictionary with the key specified.

Configuration Options: .. list-table:

- ``function``: ``REQUIRED`` lambda function to be run.
- ``output_key``: Optional name of the key you would like to output else
                  response will be merged.
- ``args``: Optional list of arguments for function.
            Use $ for previously extracted terms
- ``kwargs``: Optional dictionary of key word arguments for function.
              Use $ for previously extracted terms

Example Configuration: .. code-block:: yaml

  • method: lambda inputs:

    function: ‘lambda x: x * x’ args:

    • hello

    • $world

    kwargs:

    hello: world goodbye: all

input_class

alias of LambdaInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.lambda.LambdaInput(*, exists_key: str = '$', exists_delimiter: str = '.', function: str, args: list[Any] = [], kwargs: dict[str, Any] = {}, output_key: str = 'label')

Bases: Input

Model for Lambda Input.

args: list[Any]
function: str
kwargs: dict[str, Any]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'args': FieldInfo(annotation=list[Any], required=False, default=[], description='list of arguments for function.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'function': FieldInfo(annotation=str, required=True, description='lambda function to be run.'), 'kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='dictionary of key word arguments for function.'), 'output_key': FieldInfo(annotation=str, required=False, default='label', description='key to output to.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str

extraction_methods.plugins.netcdf module

NetCDF Method

class extraction_methods.plugins.netcdf.NetCDFExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: netcdf

Description: Processes XML documents to extract metadata

Configuration Options: .. list-table:

- ``extraction_keys``: List of keys to retrieve from the document.
- ``filter_expr``: Regex to match against files to limit the attempts to known files
- ``namespaces``: Map of namespaces
Extraction Keys:

Extraction keys should be a map.

Name

Description

output_key

Name of the outputted attribute

key

Access key to extract the required data. Passed to xml.etree.ElementTree.find() and also supports xpath formatted accessors

attribute

Allows you to select from the element attribute. In the absence of this value, the default behaviour is to access the text value of the key. In some cases, you might want to access and attribute of the element

Example configuration: .. code-block:: yaml

  • method: xml inputs:

    filter_expr: ‘.manifest$’ extraction_keys:

    • name: start_datetime key: ‘.//gml:beginPosition’ attribute: start

# noqa: W605

input_class

alias of NetCDFInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.netcdf.NetCDFInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', variable_id: str = '$uri', variable_attributes: list[KeyOutputKey] = [], global_attributes: list[KeyOutputKey] = [], cf_attributes: list[KeyOutputKey] = [], rio_attributes: list[KeyOutputKey] = [])

Bases: Input

Model for NetCDF Input.

cf_attributes: list[KeyOutputKey]
global_attributes: list[KeyOutputKey]
input_term: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'cf_attributes': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of cf attributes to extract.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'global_attributes': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of global attributes to extract.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'rio_attributes': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of rio attributes to extract.'), 'variable_attributes': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of variable attributes to extract.'), 'variable_id': FieldInfo(annotation=str, required=False, default='$uri', description='lambda function to be run.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

rio_attributes: list[KeyOutputKey]
variable_attributes: list[KeyOutputKey]
variable_id: str

extraction_methods.plugins.open_zip module

Open Zip Method

class extraction_methods.plugins.open_zip.ZipExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: open_zip

Description:

Open a zip file and read inner files

Configuration Options: .. list-table:

- ``input_term``: List of keys to retrieve from the document.
- ``inner_files``: Lost of inner zipped files to be read.
- ``output_key``: key to output to.

Example configuration: .. code-block:: yaml

  • method: open_zip inputs:

    input_term: /path/to/a/file inner_files:

    • key: hello.txt output_key: world

# noqa: W605

input_class

alias of ZipInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.open_zip.ZipInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', inner_files: list[KeyOutputKey] = [], output_key: str = '')

Bases: Input

Model for Zip Input.

check_root_read() Self
inner_files: list[KeyOutputKey]
input_term: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'inner_files': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of inner zipped files to be read.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'output_key': FieldInfo(annotation=str, required=False, default='', description='key to output to.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str

extraction_methods.plugins.path_parts module

Path Parts Method

class extraction_methods.plugins.path_parts.PathPartsExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: path_parts

Description:

Extracts the parts of a given path skipping skip number of top level parts.

Configuration Options: .. list-table:

- ``skip``: The number of path parts to skip. ``default: 0``

Example configuration: .. code-block:: yaml

  • method: path_parts inputs:

    input_term: $uri skip: 2

input_class

alias of PathPartsInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.path_parts.PathPartsInput(*, exists_key: str = '$', exists_delimiter: str = '.', path: str = '$uri', skip: int = 0)

Bases: Input

Model for Path Parts Input.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'path': FieldInfo(annotation=str, required=False, default='$uri', description='path for method to run on.'), 'skip': FieldInfo(annotation=int, required=False, default=0, description='number of path parts to skip.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

path: str
skip: int

extraction_methods.plugins.regex module

Regex Method

class extraction_methods.plugins.regex.RegexExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: regex

Description:

Takes an input string and a regex with named capture groups and returns a dictionary of the values extracted using the named capture groups.

Configuration Options: .. list-table:

- ``input_term``: Term for regex to be ran on.
- ``regex``: ``REQUIRED`` The regular expression to match against.

Example configuration: .. code-block:: yaml

  • method: regex inputs:

    regex: ^(?:[^_]*_){2}(?P<datetime>d*)

# noqa: W605

input_class

alias of RegexInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.regex.RegexInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', regex: str)

Bases: Input

Model for Regex Input.

input_term: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'regex': FieldInfo(annotation=str, required=True, description='The regular expression to match against.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

regex: str

extraction_methods.plugins.regex_label module

Regex Label Method

class extraction_methods.plugins.regex_label.RegexLabelExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: regex_label

Description:

Adds label if full match of regex.

Configuration Options: .. list-table:

- ``input_term``: term for method to run on.
- ``label``: ``REQUIRED`` Label to add if regex passes.
- ``regex``: ``REQUIRED`` Regex to test against.
- ``allow_multiple``: True if multiple labels are allowed.
- ``output_key``: Term for method to output to.

Example configuration: .. code-block:: yaml

  • method: regex_label inputs:

    label: metadata regex: README allow_multiple: true

# noqa: W605

input_class

alias of RegexLabelInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.regex_label.RegexLabelInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', label: str, regex: str, allow_multiple: bool = True, output_key: str = 'label')

Bases: Input

Model for Regex Label Input.

allow_multiple: bool
input_term: str
label: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'allow_multiple': FieldInfo(annotation=bool, required=False, default=True, description='True if multiple labels are allowed.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'label': FieldInfo(annotation=str, required=True, description='Label to add if regex passes.'), 'output_key': FieldInfo(annotation=str, required=False, default='label', description='Term for method to output to.'), 'regex': FieldInfo(annotation=str, required=True, description='Regex to test against.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str
regex: str

extraction_methods.plugins.regex_rename module

Regex Rename Method

class extraction_methods.plugins.regex_rename.RegexOutputKey(*, exists_key: str = '$', exists_delimiter: str = '.', regex: str, output_key: str)

Bases: Input

Model for Regex.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'output_key': FieldInfo(annotation=str, required=True, description='Term for method to output to.'), 'regex': FieldInfo(annotation=str, required=True, description='Regex to test against.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str
regex: str
class extraction_methods.plugins.regex_rename.RegexRenameExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: regex_rename

Description:

Takes a list of regex and output key combinations. Any existing properties that full match a regex are rename to the output key. Later regex take precedence.

Configuration Options: .. list-table:

- ``regex_swaps``: Regex and output key combinations.

Example configuration: .. code-block:: yaml

  • method: regex_rename inputs:

    regex_swaps:
    • regex: README output_key: metadata

# noqa: W605

input_class

alias of RegexRenameInput

matching_keys(keys: KeysView[str], key_regex: str) list[str]

Find all keys that match regex

Parameters:
  • keys (KeysView) – dictionary keys to test

  • key_regex (str) – regex to test against

Returns:

matching keys

Return type:

list

rename(body: dict[str, Any], key_parts: list[str], output_key: str) dict[str, Any]

Rename terms

Parameters:
  • body (dict) – current body

  • key_parts (list) – key parts seperated by delimiter

Returns:

dict

Return type:

update body

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.regex_rename.RegexRenameInput(*, exists_key: str = '$', exists_delimiter: str = '.', regex_swaps: list[RegexOutputKey], delimiter: str = '')

Bases: Input

Model for Regex Rename Input.

delimiter: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'delimiter': FieldInfo(annotation=str, required=False, default='', description='delimiter for nested term.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'regex_swaps': FieldInfo(annotation=list[RegexOutputKey], required=True, description='Regex and output key combinations.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

regex_swaps: list[RegexOutputKey]

extraction_methods.plugins.regex_type_cast module

Regex Type Cast Method

class extraction_methods.plugins.regex_type_cast.RegexCastType(*, exists_key: str = '$', exists_delimiter: str = '.', regex: str, cast_type: str)

Bases: Input

Model for Regex Cast Type.

cast_type: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'cast_type': FieldInfo(annotation=str, required=True, description='Python type to cast to.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'regex': FieldInfo(annotation=str, required=True, description='Regex to test against.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

regex: str
class extraction_methods.plugins.regex_type_cast.RegexTypeCastExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: regex_type_cast

Description:

Takes a list of regex and cast type combinations. Any existing properties that full match a regex are cast to the associated type.

Configuration Options: .. list-table:

- ``regex_casts``: Regex and cast type combinations.

Example configuration: .. code-block:: yaml

  • method: regex_type_cast inputs:

    regex_casts:
    • regex: clound_cover cast_type: int

# noqa: W605

input_class

alias of RegexTypeCastInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.regex_type_cast.RegexTypeCastInput(*, exists_key: str = '$', exists_delimiter: str = '.', regex_casts: list[RegexCastType])

Bases: Input

Model for Regex Cast Type Input.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'regex_casts': FieldInfo(annotation=list[RegexCastType], required=True, description='Regex and cast type combinations.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

regex_casts: list[RegexCastType]

extraction_methods.plugins.remove module

Remove Method

class extraction_methods.plugins.remove.RemoveExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: remove

Description:

remove keys from body.

Configuration Options: .. list-table:

- ``keys``: ``REQUIRED`` list of keys to remove.
- ``delimiter``: delimiter for nested key.

Example Configuration: .. code-block:: yaml

  • method: remove inputs:

    keys: - hello - world

input_class

alias of RemoveInput

matching_keys(keys: KeysView[str], key_regex: str) list[str]

Find all keys that match regex

Parameters:
  • keys (KeysView) – dictionary keys to test

  • key_regex (str) – regex to test against

Returns:

matching keys

Return type:

list

remove_key(body: dict[str, Any], key_parts: list[str]) dict[str, Any]

Remove nested terms

Parameters:
  • body (dict) – current body

  • key_parts (list) – key parts seperated by delimiter

Returns:

dict

Return type:

update body

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.remove.RemoveInput(*, exists_key: str = '$', exists_delimiter: str = '.', keys: list[str], delimiter: str = '.')

Bases: Input

Model for Remove Input.

delimiter: str
keys: list[str]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'delimiter': FieldInfo(annotation=str, required=False, default='.', description='delimiter for nested term.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'keys': FieldInfo(annotation=list[str], required=True, description='list of keys to remove.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

extraction_methods.plugins.stac_extension module

STAC Extension Method

class extraction_methods.plugins.stac_extension.STACExtension(*, url: str, prefix: str, properties: list[str])

Bases: BaseModel

Model for STAC Extension.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'prefix': FieldInfo(annotation=str, required=True, description='Extension prefix.'), 'properties': FieldInfo(annotation=list[str], required=True, description='Extension properties.'), 'url': FieldInfo(annotation=str, required=True, description='Extension URL.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

prefix: str
properties: list[str]
url: str
class extraction_methods.plugins.stac_extension.STACExtensionExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: stac_extension

Description:

Accepts a list of extensions which contain url, prefix and list of properties.

Configuration Options: .. list-table:

- ``extensions``: ``REQUIRED`` List of extensions.

Example Configuration: .. code-block:: yaml

  • method: stac_extension inputs:

    extensions:
    • url: hello.com/v1.0.0/world.json prefix: hello properties:

      • foo

      • bar

input_class

alias of STACExtensionInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.stac_extension.STACExtensionInput(*, exists_key: str = '$', exists_delimiter: str = '.', extensions: list[STACExtension])

Bases: Input

Model for STAC Extension Input.

extensions: list[STACExtension]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'extensions': FieldInfo(annotation=list[STACExtension], required=True, description='List of extensions.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

extraction_methods.plugins.string_template module

String Template Method

class extraction_methods.plugins.string_template.StringTemplateExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: string_template

Description:

Accepts a template and output_key. terms are added to the template.

Configuration Options: .. list-table:

- ``template``: ``REQUIRED`` Template to follow.
- ``descructive``: True if terms should be removed after templating.
- ``output_key``: ``REQUIRED`` key to output to.

Example Configuration: .. code-block:: yaml

  • method: string_template inputs:

    template: {hello}/{goodbye}/{hello}/bonjour.html output_key: manifest_url

input_class

alias of StringTemplateInput

run(body: dict[str, Any]) Any

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.string_template.StringTemplateInput(*, exists_key: str = '$', exists_delimiter: str = '.', template: str, descructive: bool = False, output_key: str)

Bases: Input

Model for String Template Input.

descructive: bool
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'descructive': FieldInfo(annotation=bool, required=False, default=False, description='True if terms should be removed after templating.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'output_key': FieldInfo(annotation=str, required=True, description='key to output to.'), 'template': FieldInfo(annotation=str, required=True, description='Template to follow.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

output_key: str
template: str

extraction_methods.plugins.xml module

XML Method

class extraction_methods.plugins.xml.XMLExtract(*args: Any, **kwargs: Any)

Bases: ExtractionMethod

Method: xml

Description:

Processes XML documents to extract metadata

Configuration Options: .. list-table:

- ``input_term``: Term for method to run on.
- ``template``: ``REQUIRED`` Template to follow.
- ``properties``: ``REQUIRED`` List of properties to retrieve from the document.
- ``namespaces``: ``REQUIRED`` Map of namespaces.
Extraction Keys:

Extraction keys should be a map.

Name

Description

key

Key of the property. Passed to xml.etree.ElementTree.find() and also supports xpath formatted accessors

output_key

Key to output to.

attribute

Allows you to select from the element attribute. In the absence of this value, the default behaviour is to access the text value of the key. In some cases, you might want to access and attribute of the element.

Example configuration: .. code-block:: yaml

  • method: xml inputs:

    properties:
    • name: start_datetime key: ‘.//gml:beginPosition’ attribute: start

# noqa: W605

input_class

alias of XMLInput

run(body: dict[str, Any]) dict[str, Any]

Run the method.

Parameters:

body (dict) – current generated properties

Returns:

updated body dict

Return type:

dict

class extraction_methods.plugins.xml.XMLInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', properties: list[XMLProperty], namespaces: dict[str, str])

Bases: Input

Model for XML Input.

input_term: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='Term for method to run on.'), 'namespaces': FieldInfo(annotation=dict[str, str], required=True, description='Map of namespaces.'), 'properties': FieldInfo(annotation=list[XMLProperty], required=True, description='List of properties to retrieve from the document.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

namespaces: dict[str, str]
properties: list[XMLProperty]
class extraction_methods.plugins.xml.XMLProperty(*, key: str, output_key: str = '', attribute: str = '')

Bases: KeyOutputKey

Model for XML property.

attribute: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'attribute': FieldInfo(annotation=str, required=False, default='', description='Attribute of the XML property.'), 'key': FieldInfo(annotation=str, required=True), 'output_key': FieldInfo(annotation=str, required=False, default='')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

Module contents