extraction_methods.plugins package
Subpackages
Submodules
extraction_methods.plugins.bbox module
Bounding Box Method
- class extraction_methods.plugins.bbox.BboxExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
bbox
- Description:
Converts a coordinate values to RFC 7946, section 5 formatted bbox.
Configuration Options: .. list-table:
- ``west``: ``REQUIRED`` Most westerly coordinate - ``south``: ``REQUIRED`` Most southernly coordinate - ``east``: ``REQUIRED`` Most easterly coordinate - ``north``: ``REQUIRED`` Most northernly coordinate
Example Configuration: .. code-block:: yaml
method: bbox inputs:
west: 0 south: 0 east: $east_variable north: $north_variable
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.bbox.BboxInput(*, exists_key: str = '$', exists_delimiter: str = '.', west: float | str, south: float | str, east: float | str, north: float | str)
Bases:
Input
Model for BBox Method Input.
- east: float | str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'east': FieldInfo(annotation=Union[float, str], required=True, description='east coordinate.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'north': FieldInfo(annotation=Union[float, str], required=True, description='north coordinate.'), 'south': FieldInfo(annotation=Union[float, str], required=True, description='south coordinate.'), 'west': FieldInfo(annotation=Union[float, str], required=True, description='west coordinate.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- north: float | str
- south: float | str
- west: float | str
extraction_methods.plugins.ceda_observation module
CEDA Observation Method
- class extraction_methods.plugins.ceda_observation.CEDAObservationExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
ceda_observation
- Description:
Returns a ceda observation record for the
input_term
.
Configuration Options: .. list-table:
- ``input_term``: ``REQUIRED`` term for method to run on
Example Configuration: .. code-block:: yaml
method: ceda_observation inputs:
input_term: $url
- input_class
alias of
CEDAObservationInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.ceda_observation.CEDAObservationInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', request_timeout: int = 15, output_key: str = 'uuid')
Bases:
Input
Model for CEDA Observation Method Input.
- input_term: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'output_key': FieldInfo(annotation=str, required=False, default='uuid', description='key to output to.'), 'request_timeout': FieldInfo(annotation=int, required=False, default=15, description='request time out.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- output_key: str
- request_timeout: int
extraction_methods.plugins.ceda_vocabulary module
CEDA Vocabulary Method
- class extraction_methods.plugins.ceda_vocabulary.CEDAVocabularyExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
ceda_vocabulary
- Description:
Validates and sorts properties into vocabs and generates the general vocab for specified properties.
Configuration Options: .. list-table:
- ``url``: ``REQUIRED`` url of vocabulary server - ``namespace``: ``REQUIRED`` namespace of vocab for terms - ``terms``: Terms to be validated - ``strict``: Boolean on whether values should be validated - ``request_timeout``: request time out
Example configuration: .. code-block:: yaml
method: ceda_vocabulary inputs:
url: vocab.ceda.ac.uk namespace: cmip6 strict: False terms:
start_time
model
- input_class
alias of
CEDAVocabularyInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.ceda_vocabulary.CEDAVocabularyInput(*, exists_key: str = '$', exists_delimiter: str = '.', url: str, namespace: str, strict: bool = False, terms: list[str] = [], request_timeout: int = 15)
Bases:
Input
Model for CEDA Vocab Method Input.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'namespace': FieldInfo(annotation=str, required=True, description='Namespace for vocab terms.'), 'request_timeout': FieldInfo(annotation=int, required=False, default=15, description='request time out.'), 'strict': FieldInfo(annotation=bool, required=False, default=False, description='True if values should be validated.'), 'terms': FieldInfo(annotation=list[str], required=False, default=[], description='terms to be validated.'), 'url': FieldInfo(annotation=str, required=True, description='URL of vocabulary server.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- namespace: str
- request_timeout: int
- strict: bool
- terms: list[str]
- url: str
extraction_methods.plugins.controlled_vocabulary module
extraction_methods.plugins.datetime_bound_to_centroid module
Datetime Bound to Centroid Method
- class extraction_methods.plugins.datetime_bound_to_centroid.DatetimeBoundToCentroidExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
datetime_bound_to_centroid
- Description:
Accepts a dictionary of coordinate values and converts to RFC 7946, section 5 formatted bbox.
Configuration Options: .. list-table:
- ``start_datetime``: Start datetime bound - ``start_format``: Format of the start datetime - ``end_datetime``: End datetime bound - ``end_format``: Format of the end datetime - ``output_key``: Term for method to output to - ``output_format``: Format of the output datetime
Example Configuration: .. code-block:: yaml
method: datetime_bound_to_centroid inputs:
start_datetime: $start_date end_datetime: 2022-02-02 end_format: %Y-%m-%d output_key: polygon
- input_class
alias of
DatetimeBoundToCentroidInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- strip_time(datetime_str: str, datetime_format: str) datetime
strip datetime from value.
- Parameters:
datetime_str (str) – string to convert to datetime
datetime_format (str) – format of datetime string
- Returns:
datetime object
- Return type:
datetime
- class extraction_methods.plugins.datetime_bound_to_centroid.DatetimeBoundToCentroidInput(*, exists_key: str = '$', exists_delimiter: str = '.', start_datetime: str = '$start_datetime', start_format: str = '%Y-%m-%dT%H:%M:%S', end_datetime: str = '$end_datetime', end_format: str = '%Y-%m-%dT%H:%M:%S', output_key: str = 'datetime', output_format: str = '%Y-%m-%dT%H:%M:%SZ')
Bases:
Input
Model for Datetime Bound to Centroid Method Input.
- end_datetime: str
- end_format: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'end_datetime': FieldInfo(annotation=str, required=False, default='$end_datetime', description='End datetime bound.'), 'end_format': FieldInfo(annotation=str, required=False, default='%Y-%m-%dT%H:%M:%S', description='Format of end datetime.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'output_format': FieldInfo(annotation=str, required=False, default='%Y-%m-%dT%H:%M:%SZ', description='format of output.'), 'output_key': FieldInfo(annotation=str, required=False, default='datetime', description='key to output to.'), 'start_datetime': FieldInfo(annotation=str, required=False, default='$start_datetime', description='Start datetime bound.'), 'start_format': FieldInfo(annotation=str, required=False, default='%Y-%m-%dT%H:%M:%S', description='Format for start datetime.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- output_format: str
- output_key: str
- start_datetime: str
- start_format: str
extraction_methods.plugins.default module
Default Method
- class extraction_methods.plugins.default.DefaultExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
default
- Description:
Takes a set of default facets.
Configuration Options: .. list-table:
- ``defaults``: Dictionary of defaults to be added
Example configuration: .. code-block:: yaml
method: default inputs:
- defaults:
mip_era: CMIP6
- input_class
alias of
DefaultInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.default.DefaultInput(*, exists_key: str = '$', exists_delimiter: str = '.', defaults: dict[str, Any])
Bases:
Input
Model for Default Method Input.
- defaults: dict[str, Any]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'defaults': FieldInfo(annotation=dict[str, Any], required=True, description='Defaults to be added.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
extraction_methods.plugins.dict_aggregator module
Dictionary Aggregator Method
- class extraction_methods.plugins.dict_aggregator.DictAggregatorExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
dict_aggregator
- Description:
Aggregate information within dictionary.
Configuration Options: .. list-table:
- ``min``: list of terms for which the minimum of their aggregate should be returned - ``max``: list of terms for which the maximum of their aggregate should be returned - ``sum``: list of terms for which the sum of their aggregate should be returned - ``list``: list of terms for which a list of their aggregage should be returned - ``mean``: list of terms for which a list of their aggregage should be returned
Configuration Example: .. code-block:: yaml
method: dict_aggregator inputs:
- min:
start_time
- max:
end_time
- sum:
size
- list:
term1
term2
- input_class
alias of
DictAggregatorInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.dict_aggregator.DictAggregatorInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str | dict[str, Any] = '$assets', min: list[KeyOutputKey] = [], max: list[KeyOutputKey] = [], sum: list[KeyOutputKey] = [], mean: list[KeyOutputKey] = [], bucket: list[KeyOutputKey] = [])
Bases:
Input
Model for Dictionary Aggregator Method Input.
- bucket: list[KeyOutputKey]
- input_term: str | dict[str, Any]
- max: list[KeyOutputKey]
- mean: list[KeyOutputKey]
- min: list[KeyOutputKey]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'bucket': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the list of their aggregate should be returned.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=Union[str, dict[str, Any]], required=False, default='$assets', description='term for method to run on.'), 'max': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the maximum of their aggregate should be returned.'), 'mean': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the mean of their summed aggregate should be returned.'), 'min': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the minimum of their aggregate should be returned.'), 'sum': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the sum of their aggregate should be returned.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- sum: list[KeyOutputKey]
extraction_methods.plugins.elasticsearch_aggregation module
Elasticsearch Aggregation Method
- class extraction_methods.plugins.elasticsearch_aggregation.ElasticsearchAggregationExtract(**kwargs: Any)
Bases:
ExtractionMethod
Method:
elasticsearch_aggregation
- Description:
Using an ID. Generate a summary of information for higher level entities.
Configuration Options: .. list-table:
- ``index``: Name of the index holding the STAC entities - ``id_term``: Term used for agregating the STAC entities - ``client_kwargs``: Session parameters passed to `elasticsearch.Elasticsearch<https://elasticsearch-py.readthedocs.io/en/7.10.0/api.html>`_ - ``bbox``: list of terms for which their aggregate bbox should be returned - ``min``: list of terms for which the minimum of their aggregate should be returned - ``max``: list of terms for which the maximum of their aggregate should be returned - ``sum``: list of terms for which the sum of their aggregate should be returned - ``list``: list of terms for which a list of their aggregage should be returned
Configuration Example: .. code-block:: yaml
method: elasticsearch_aggregation inputs:
index: ceda-index id_term: item_id client_kwargs:
hosts: [‘host1:9200’,’host2:9200’]
- bbox:
bbox
- min:
start_time
- max:
end_time
- sum:
size
- list:
term1
term2
- base_query() dict[str, Any]
Base query to filter the results to a single collection.
- Returns:
base query
- Return type:
dict
- static basic_aggregation(agg_type: str, facet: KeyOutputKey) dict[str, Any]
Query to retrieve the minimum value from docs.
- Parameters:
agg_type (str) – type of aggregation
facet (KeyOutputKey) – facet to aggregate
- Returns:
basic aggregation query
- Return type:
dict
- construct_query() dict[str, Any]
Function to create the initial elasticsearch query.
- Returns:
aggregation query
- Return type:
dict
- extract_facet(aggregations: dict[str, Any], facet: KeyOutputKey) Any
Function to extract the given facets from the aggregation.
- Parameters:
input_dict (dict) – aggregations
facet – facet to be extracted
- Returns:
extracted facet
- Return type:
Any
- extract_facet_lists(query: dict[str, Any], aggregations: dict[str, Any], facets: list[KeyOutputKey]) dict[str, Any]
Function to extract the lists of given facets from the aggregation.
- Parameters:
query (dict) – attribute dictionary to update
aggregations (dict) – current generated properties
facets (list) – facets to be extracted
- Returns:
extracted list facets
- Return type:
dict
- extract_first_facet(properties: dict[str, Any], facet: KeyOutputKey) Any
Function to extract the given default facets from the first hit.
- Parameters:
properties (dict) – properties from first record
facet – current facet to be extracted
- Returns:
extracted facet
- Return type:
Any
- extract_metadata(query: dict[str, Any], result: dict[str, Any]) dict[str, Any]
Function to extract the required metadata from the returned query result.
- Parameters:
query (dict) – previous query
result (dict) – resutls from previous query
- Returns:
metadata
- Return type:
dict
- static facet_composite_aggregation(facet: KeyOutputKey) dict[str, Any]
Generate the composite aggregation for the facet.
- Parameters:
facet (KeyOutputKey) – facet to aggregate
- Returns:
composite aggregation query
- Return type:
dict
- run(body: dict[str, Any]) dict[str, Any]
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.elasticsearch_aggregation.ElasticsearchAggregationInput(*, exists_key: str = '$', exists_delimiter: str = '.', index: str, id_term: str, client_kwargs: dict[str, Any] = {}, search_query: dict[str, Any] = {'bool': {'must': [{'term': {'path': {'value': '$uri'}}}], 'must_not': [{'term': {'categories.keyword': {'value': 'hidden'}}}]}}, geo_bound: list[KeyOutputKey] = [], first: list[KeyOutputKey] = [], min: list[KeyOutputKey] = [], max: list[KeyOutputKey] = [], sum: list[KeyOutputKey] = [], mean: list[KeyOutputKey] = [], bucket: list[KeyOutputKey] = [], request_tiemout: int = 15, allow_multiple: bool = True, output_key: str = 'label')
Bases:
Input
Model for Elasticsearch Aggregation Input.
- allow_multiple: bool
- bucket: list[KeyOutputKey]
- client_kwargs: dict[str, Any]
- first: list[KeyOutputKey]
- geo_bound: list[KeyOutputKey]
- id_term: str
- index: str
- max: list[KeyOutputKey]
- mean: list[KeyOutputKey]
- min: list[KeyOutputKey]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'allow_multiple': FieldInfo(annotation=bool, required=False, default=True, description='True if multiple labels are allowed.'), 'bucket': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the list of their aggregate should be returned.'), 'client_kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='Parameters passed to elasticsearch client.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'first': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description="list of terms for which the first record's value should be returned."), 'geo_bound': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the minimum of their aggregate should be returned.'), 'id_term': FieldInfo(annotation=str, required=True, description='Term used for agregating the STAC entities.'), 'index': FieldInfo(annotation=str, required=True, description='Name of the index holding the STAC entities.'), 'max': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the maximum of their aggregate should be returned.'), 'mean': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the mean of their summed aggregate should be returned.'), 'min': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the minimum of their aggregate should be returned.'), 'output_key': FieldInfo(annotation=str, required=False, default='label', description='key to output to.'), 'request_tiemout': FieldInfo(annotation=int, required=False, default=15, description='Time out for search.'), 'search_query': FieldInfo(annotation=dict[str, Any], required=False, default={'bool': {'must_not': [{'term': {'categories.keyword': {'value': 'hidden'}}}], 'must': [{'term': {'path': {'value': '$uri'}}}]}}, description='Session parameters passed to elasticsearch client.'), 'sum': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of terms for which the sum of their aggregate should be returned.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- output_key: str
- request_tiemout: int
- search_query: dict[str, Any]
- sum: list[KeyOutputKey]
extraction_methods.plugins.facet_map module
Facet Map Method
- class extraction_methods.plugins.facet_map.FacetMapExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
facet_map
- Description:
In some cases, you may wish to map the header attributes to different facets. This method takes a map and converts the facet labels into those specified.
Configuration Options: .. list-table:
- ``term_map``: Dictionary of terms to map.
Example Configuration: .. code-block:: yaml
method: facet_map inputs:
- term_map:
old_key: new_key time_coverage_start: start_time
- input_class
alias of
FacetMapInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.facet_map.FacetMapInput(*, exists_key: str = '$', exists_delimiter: str = '.', term_map: dict[str, str] = {})
Bases:
Input
Model for Facet Map Input.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'term_map': FieldInfo(annotation=dict[str, str], required=False, default={}, description='Dictionary of terms to be mapped.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- term_map: dict[str, str]
extraction_methods.plugins.facet_prefix module
Facet Prefix Method
- class extraction_methods.plugins.facet_prefix.FacetPrefixExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
facet_prefix
- Description:
In some cases, you may wish add a prefix to some or all of the facets based on the vocabulary they’re from.
Configuration Options: .. list-table:
- ``prefix``: Prefix to be added. - ``keys``: List of keys that require prefix.
Example Configuration: .. code-block:: yaml
method: facet_prefix inputs:
prefix: cmip6 keys:
start_time
model
- input_class
alias of
FacetPrefixInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.facet_prefix.FacetPrefixInput(*, exists_key: str = '$', exists_delimiter: str = '.', prefix: str, keys: list[str])
Bases:
Input
Model for Facet Prefix Input.
- keys: list[str]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'keys': FieldInfo(annotation=list[str], required=True, description='list of keys that require prefix.'), 'prefix': FieldInfo(annotation=str, required=True, description='Prefix to be added.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- prefix: str
extraction_methods.plugins.general_function module
General Function Method
- class extraction_methods.plugins.general_function.Function(*, name: str, args: list[Any] = [], kwargs: dict[str, Any] = {})
Bases:
BaseModel
Model for Fuction.
- args: list[Any]
- kwargs: dict[str, Any]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'args': FieldInfo(annotation=list[Any], required=False, default=[], description='list of arguments for function.'), 'kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='dictionary of key word arguments for function.'), 'name': FieldInfo(annotation=str, required=True, description='Name of function.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- name: str
- class extraction_methods.plugins.general_function.GeneralFunctionExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
general_function
- Description:
Accepts a dictionary. String values are popped from the dictionary and are put back into the dictionary with the
key
specified.
Configuration Options: .. list-table:
- ``function``: ``REQUIRED`` Function to be run ``name``, ``args``, and ``kwargs``. - ``delimiter``: Optional text delimiter to put between module/function names ``Default`` "." - ``output_key``: Optional name of the key you would like to output else response will be merged.
Example Configuration: .. code-block:: yaml
method: general_function inputs:
- funtion:
name: import.path.to.the.fuction args:
hello
world
- kwargs:
hello: world foo: bar
- input_class
alias of
GeneralFunctionInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.general_function.GeneralFunctionInput(*, exists_key: str = '$', exists_delimiter: str = '.', function: Function, delimiter: str = '.', output_key: str = '')
Bases:
Input
Model for General Fuction Input.
- delimiter: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'delimiter': FieldInfo(annotation=str, required=False, default='.', description='text delimiter to put between module/function names.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'function': FieldInfo(annotation=Function, required=True, description='Function to be run name maybe seperatated my delimieter.'), 'output_key': FieldInfo(annotation=str, required=False, default='', description='key to output to, else response will be merged with body.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- output_key: str
extraction_methods.plugins.geometry module
Geometry Method
- class extraction_methods.plugins.geometry.GeometryExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
geometry
- Description:
Accepts a dictionary of coordinate values and converts to RFC 7946, formatted geometry.
Configuration Options: .. list-table:
- ``type``: ``REQUIRED`` Type of geometry to be produced. - ``coordinates``: ``REQUIRED`` list of coordinates to convert to geometry. Ordering is respected. - ``output_key``: key to output to.
Example Configuration: .. code-block:: yaml
name: geometry inputs:
type: line coordinates:
0
0
$lon_2
$lat_2
- get_coordinates(coordinate_type: str, coordinates: list[Any]) list[Any]
Get coordinates
- Parameters:
coordinate_type (str) – type of coordinates
coordinates (list) – list of coordinates
- Returns:
coordinates
- Return type:
list
- input_class
alias of
GeometryInput
- line(coordinates: list[list[str | float]]) list[list[float]]
Get line coordinates
- Parameters:
coordinates (list) – list of coordinates
- Returns:
coordinates
- Return type:
list
- multi(coordinate_type: str, coordinates: list[Any]) list[Any]
Get polygon coordinates
- Parameters:
coordinate_type (str) – type of coordinates
coordinates (list) – list of coordinates
- Returns:
coordinates
- Return type:
list
- point(coordinates: list[str | float]) list[float]
Get point coordinates
- Parameters:
coordinates (list) – list of coordinates
- Returns:
coordinates
- Return type:
list
- polygon(coordinates: list[list[str | float]]) list[list[list[float]]]
Get polygon coordinates
- Parameters:
coordinates (list) – list of coordinates
- Returns:
coordinates
- Return type:
list
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.geometry.GeometryInput(*, exists_key: str = '$', exists_delimiter: str = '.', type: Literal['Point', 'LineString', 'Polygon', 'MultiPointString', 'MultiLineString', 'MultiPolygon'], coordinates: list[Any], output_key: str = 'geometry')
Bases:
Input
Model for Geometry Input.
- coordinates: list[Any]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'coordinates': FieldInfo(annotation=list[Any], required=True, description='list of coordinates to convert to geometry. Ordering is respected.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'output_key': FieldInfo(annotation=str, required=False, default='geometry', description='key to output to.'), 'type': FieldInfo(annotation=Literal[str, str, str, str, str, str], required=True, description='Type of geometry to be produced.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- output_key: str
- type: Literal['Point', 'LineString', 'Polygon', 'MultiPointString', 'MultiLineString', 'MultiPolygon']
extraction_methods.plugins.geometry_to_bbox module
Geometry to Bounding Box Method
- class extraction_methods.plugins.geometry_to_bbox.GeometryToBboxExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
geometry_to_bbox
- Description:
Accepts a geometry with type and list of coordinates to RFC 7946, section 5 formatted bbox.
Configuration Options: .. list-table:
- ``geometry``: ``REQUIRED`` geometry to be converted to bbox. - ''output_key'': key to output to.
Example Configuration: .. code-block:: yaml
method: geometry_to_bbox inputs:
- geometry:
type: point coordinates:
20
0
- get_bbox(coordinate_type: str, coordinates: list[Any]) list[float]
Get bbox from geometry
- Parameters:
coordinate_type (str) – type of coordinates
coordinates (list) – list of coordinates
- Returns:
bounding box of coordinates
- Return type:
list
- input_class
alias of
GeometryToBboxInput
- line(coordinates: list[list[float]]) list[float]
Get line bbox
- Parameters:
coordinates (list) – list of coordinates
- Returns:
bounding box of coordinates
- Return type:
list
- multi(coordinate_type: str, coordinates: list[Any]) list[float]
Get polygon bbox
- Parameters:
coordinate_type (str) – type of coordinates
coordinates (list) – list of coordinates
- Returns:
bounding box of coordinates
- Return type:
list
- point(coordinates: list[float]) list[float]
Get point bbox
- Parameters:
coordinates (list) – list of coordinates
- Returns:
bounding box of coordinates
- Return type:
list
- polygon(coordinates: list[list[float]]) list[float]
Get polygon bbox
- Parameters:
coordinates (list) – list of coordinates
- Returns:
bounding box of coordinates
- Return type:
list
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.geometry_to_bbox.GeometryToBboxInput(*, exists_key: str = '$', exists_delimiter: str = '.', geometry: dict[str, Any] = '$geometry', output_key: str = 'bbox')
Bases:
Input
Model for Geometry to Bounding Box Input.
- geometry: dict[str, Any]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'geometry': FieldInfo(annotation=dict[str, Any], required=False, default='$geometry', description='geometry to be converted to bbox.'), 'output_key': FieldInfo(annotation=str, required=False, default='bbox', description='key to output to.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- output_key: str
extraction_methods.plugins.hash module
Hash Method
- class extraction_methods.plugins.hash.HashExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
hash
- Description:
Hashes input string.
Configuration Options: .. list-table:
- ``hash_str``: string to be hashed. - ``output_key``: key to output to.
Example configuration: .. code-block:: yaml
- method: hash
- inputs:
hash_str: $model output_key: hashed_terms
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.hash.HashInput(*, exists_key: str = '$', exists_delimiter: str = '.', hash_str: str, output_key: str)
Bases:
Input
Model for Hash Input.
- hash_str: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'hash_str': FieldInfo(annotation=str, required=True, description='string to be hashed.'), 'output_key': FieldInfo(annotation=str, required=True, description='key to output to.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- output_key: str
extraction_methods.plugins.iso19115 module
ISO 19115 Method
- class extraction_methods.plugins.iso19115.ISO19115Extract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
iso19115
- Description:
Takes a URL and calls out to URL to retrieve the iso19115 record.
Configuration Options: .. list-table:
- ``url``: ``REQUIRED`` URL to record store. - ``date_terms``: List of name, key, format of date terms to retrieve from the response.
Example configuration: .. code-block:: yaml
method: iso19115 inputs:
url: $url dates:
key: ‘.//gml:beginPosition’ output_key: start_datetime
- input_class
alias of
ISO19115Input
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.iso19115.ISO19115Input(*, exists_key: str = '$', exists_delimiter: str = '.', url: str, dates: list[KeyOutputKey], request_timeout: int = 15)
Bases:
Input
Model for ISO19115 Date Input.
- dates: list[KeyOutputKey]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'dates': FieldInfo(annotation=list[KeyOutputKey], required=True, description='list of dates to extract.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'request_timeout': FieldInfo(annotation=int, required=False, default=15, description='request time out.'), 'url': FieldInfo(annotation=str, required=True, description='Url for record store.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- request_timeout: int
- url: str
extraction_methods.plugins.iso_date module
ISO Date Method
- class extraction_methods.plugins.iso_date.DateTerm(*, input_term: str, format: str = '%Y-%m-%dT%H:%M:%SZ', output_key: str = 'datetime')
Bases:
BaseModel
Model for Date terms with format.
- format: str
- input_term: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'format': FieldInfo(annotation=str, required=False, default='%Y-%m-%dT%H:%M:%SZ', description='Format of the date.'), 'input_term': FieldInfo(annotation=str, required=True, description='Term to run method on.'), 'output_key': FieldInfo(annotation=str, required=False, default='datetime', description='Key to output to.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- output_key: str
- class extraction_methods.plugins.iso_date.ISODateExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
iso_date
- Description:
Takes the source dict and the key to access the date and converts the date to ISO 8601 Format.
e.g.
YYYY-MM-DDTHH:MM:SS.ffffff
, if microsecond is not 0YYYY-MM-DDTHH:MM:SS
, if microsecond is 0If the date format cannot be parsed, it is removed from the source dict with an error logged.
Configuration Options: .. list-table:
- ``date_terms``: `REQUIRED` List keys to the date value. Using a list allows processing of multiple dates. - ``format``: Optional format string. Default behaviour uses `dateutil.parser.parse <https://dateutil.readthedocs.io/en/stable/parser.html#dateutil.parser.parse>`_. If a format string is supplied, this will change to use `datetime.datetime.strptime <https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime>`_.
Example Configuration: .. code-block:: yaml
method: iso_date inputs:
- dates:
key: $datetime output_key: date format: “%Y-%m-%dT%H:%M:%S”
key: 2012-12-12 format: “%Y-%m-%d”
- input_class
alias of
ISODateInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.iso_date.ISODateInput(*, exists_key: str = '$', exists_delimiter: str = '.', date_terms: list[DateTerm] = [])
Bases:
Input
Model for ISO Date Input.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'date_terms': FieldInfo(annotation=list[DateTerm], required=False, default=[], description='List of date terms.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
extraction_methods.plugins.json_file module
JSON File Method
- class extraction_methods.plugins.json_file.JsonFileExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
json_file
- Description:
Takes an input list of string to extract from the json file.
Configuration Options: .. list-table:
- ``path``: Path to directory or single JSON file. - ``terms``: List of terms to extract.
Example configuration: .. code-block:: yaml
method: json_file inputs:
path: /path/to/file.json properties:
key: MIP_ERA output_key: mip_era
- extract_terms(path: Path) dict[str, Any]
Extract terms from JSON file(s) at path.
- Parameters:
path (Path) – path to file
- Returns:
extracted terms
- Return type:
dict
- find_and_extract() dict[str, Any]
Find and extract from JSON files.
- Returns:
extracted terms
- Return type:
dict
- input_class
alias of
JsonFileInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.json_file.JsonFileInput(*, exists_key: str = '$', exists_delimiter: str = '.', path: str, properties: list[KeyOutputKey])
Bases:
Input
Model for JSON File Input.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'path': FieldInfo(annotation=str, required=True, description='Path to directory of JSON files or single JSON file.'), 'properties': FieldInfo(annotation=list[KeyOutputKey], required=True, description='list of properties to extract.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- path: str
- properties: list[KeyOutputKey]
extraction_methods.plugins.lambda module
Lambda Method
- class extraction_methods.plugins.lambda.LambdaExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
lambda
- Description:
Accepts a dictionary. String values are popped from the dictionary and are put back into the dictionary with the
key
specified.
Configuration Options: .. list-table:
- ``function``: ``REQUIRED`` lambda function to be run. - ``output_key``: Optional name of the key you would like to output else response will be merged. - ``args``: Optional list of arguments for function. Use $ for previously extracted terms - ``kwargs``: Optional dictionary of key word arguments for function. Use $ for previously extracted terms
Example Configuration: .. code-block:: yaml
method: lambda inputs:
function: ‘lambda x: x * x’ args:
hello
$world
- kwargs:
hello: world goodbye: all
- input_class
alias of
LambdaInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.lambda.LambdaInput(*, exists_key: str = '$', exists_delimiter: str = '.', function: str, args: list[Any] = [], kwargs: dict[str, Any] = {}, output_key: str = 'label')
Bases:
Input
Model for Lambda Input.
- args: list[Any]
- function: str
- kwargs: dict[str, Any]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'args': FieldInfo(annotation=list[Any], required=False, default=[], description='list of arguments for function.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'function': FieldInfo(annotation=str, required=True, description='lambda function to be run.'), 'kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='dictionary of key word arguments for function.'), 'output_key': FieldInfo(annotation=str, required=False, default='label', description='key to output to.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- output_key: str
extraction_methods.plugins.netcdf module
NetCDF Method
- class extraction_methods.plugins.netcdf.NetCDFExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
netcdf
Description: Processes XML documents to extract metadata
Configuration Options: .. list-table:
- ``extraction_keys``: List of keys to retrieve from the document. - ``filter_expr``: Regex to match against files to limit the attempts to known files - ``namespaces``: Map of namespaces
- Extraction Keys:
Extraction keys should be a map.
Name
Description
output_key
Name of the outputted attribute
key
Access key to extract the required data. Passed to xml.etree.ElementTree.find() and also supports xpath formatted accessors
attribute
Allows you to select from the element attribute. In the absence of this value, the default behaviour is to access the text value of the key. In some cases, you might want to access and attribute of the element
Example configuration: .. code-block:: yaml
method: xml inputs:
filter_expr: ‘.manifest$’ extraction_keys:
name: start_datetime key: ‘.//gml:beginPosition’ attribute: start
# noqa: W605
- input_class
alias of
NetCDFInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.netcdf.NetCDFInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', variable_id: str = '$uri', variable_attributes: list[KeyOutputKey] = [], global_attributes: list[KeyOutputKey] = [], cf_attributes: list[KeyOutputKey] = [], rio_attributes: list[KeyOutputKey] = [])
Bases:
Input
Model for NetCDF Input.
- cf_attributes: list[KeyOutputKey]
- global_attributes: list[KeyOutputKey]
- input_term: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'cf_attributes': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of cf attributes to extract.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'global_attributes': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of global attributes to extract.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'rio_attributes': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of rio attributes to extract.'), 'variable_attributes': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of variable attributes to extract.'), 'variable_id': FieldInfo(annotation=str, required=False, default='$uri', description='lambda function to be run.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- rio_attributes: list[KeyOutputKey]
- variable_attributes: list[KeyOutputKey]
- variable_id: str
extraction_methods.plugins.open_zip module
Open Zip Method
- class extraction_methods.plugins.open_zip.ZipExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
open_zip
- Description:
Open a zip file and read inner files
Configuration Options: .. list-table:
- ``input_term``: List of keys to retrieve from the document. - ``inner_files``: Lost of inner zipped files to be read. - ``output_key``: key to output to.
Example configuration: .. code-block:: yaml
method: open_zip inputs:
input_term: /path/to/a/file inner_files:
key: hello.txt output_key: world
# noqa: W605
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.open_zip.ZipInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', inner_files: list[KeyOutputKey] = [], output_key: str = '')
Bases:
Input
Model for Zip Input.
- check_root_read() Self
- inner_files: list[KeyOutputKey]
- input_term: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'inner_files': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='list of inner zipped files to be read.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'output_key': FieldInfo(annotation=str, required=False, default='', description='key to output to.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- output_key: str
extraction_methods.plugins.path_parts module
Path Parts Method
- class extraction_methods.plugins.path_parts.PathPartsExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
path_parts
- Description:
Extracts the parts of a given path skipping
skip
number of top level parts.
Configuration Options: .. list-table:
- ``skip``: The number of path parts to skip. ``default: 0``
Example configuration: .. code-block:: yaml
method: path_parts inputs:
input_term: $uri skip: 2
- input_class
alias of
PathPartsInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.path_parts.PathPartsInput(*, exists_key: str = '$', exists_delimiter: str = '.', path: str = '$uri', skip: int = 0)
Bases:
Input
Model for Path Parts Input.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'path': FieldInfo(annotation=str, required=False, default='$uri', description='path for method to run on.'), 'skip': FieldInfo(annotation=int, required=False, default=0, description='number of path parts to skip.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- path: str
- skip: int
extraction_methods.plugins.regex module
Regex Method
- class extraction_methods.plugins.regex.RegexExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
regex
- Description:
Takes an input string and a regex with named capture groups and returns a dictionary of the values extracted using the named capture groups.
Configuration Options: .. list-table:
- ``input_term``: Term for regex to be ran on. - ``regex``: ``REQUIRED`` The regular expression to match against.
Example configuration: .. code-block:: yaml
method: regex inputs:
regex: ^(?:[^_]*_){2}(?P<datetime>d*)
# noqa: W605
- input_class
alias of
RegexInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.regex.RegexInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', regex: str)
Bases:
Input
Model for Regex Input.
- input_term: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'regex': FieldInfo(annotation=str, required=True, description='The regular expression to match against.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- regex: str
extraction_methods.plugins.regex_label module
Regex Label Method
- class extraction_methods.plugins.regex_label.RegexLabelExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
regex_label
- Description:
Adds label if full match of regex.
Configuration Options: .. list-table:
- ``input_term``: term for method to run on. - ``label``: ``REQUIRED`` Label to add if regex passes. - ``regex``: ``REQUIRED`` Regex to test against. - ``allow_multiple``: True if multiple labels are allowed. - ``output_key``: Term for method to output to.
Example configuration: .. code-block:: yaml
method: regex_label inputs:
label: metadata regex: README allow_multiple: true
# noqa: W605
- input_class
alias of
RegexLabelInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.regex_label.RegexLabelInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', label: str, regex: str, allow_multiple: bool = True, output_key: str = 'label')
Bases:
Input
Model for Regex Label Input.
- allow_multiple: bool
- input_term: str
- label: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'allow_multiple': FieldInfo(annotation=bool, required=False, default=True, description='True if multiple labels are allowed.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'label': FieldInfo(annotation=str, required=True, description='Label to add if regex passes.'), 'output_key': FieldInfo(annotation=str, required=False, default='label', description='Term for method to output to.'), 'regex': FieldInfo(annotation=str, required=True, description='Regex to test against.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- output_key: str
- regex: str
extraction_methods.plugins.regex_rename module
Regex Rename Method
- class extraction_methods.plugins.regex_rename.RegexOutputKey(*, exists_key: str = '$', exists_delimiter: str = '.', regex: str, output_key: str)
Bases:
Input
Model for Regex.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'output_key': FieldInfo(annotation=str, required=True, description='Term for method to output to.'), 'regex': FieldInfo(annotation=str, required=True, description='Regex to test against.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- output_key: str
- regex: str
- class extraction_methods.plugins.regex_rename.RegexRenameExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
regex_rename
- Description:
Takes a list of regex and output key combinations. Any existing properties that full match a regex are rename to the output key. Later regex take precedence.
Configuration Options: .. list-table:
- ``regex_swaps``: Regex and output key combinations.
Example configuration: .. code-block:: yaml
method: regex_rename inputs:
- regex_swaps:
regex: README output_key: metadata
# noqa: W605
- input_class
alias of
RegexRenameInput
- matching_keys(keys: KeysView[str], key_regex: str) list[str]
Find all keys that match regex
- Parameters:
keys (KeysView) – dictionary keys to test
key_regex (str) – regex to test against
- Returns:
matching keys
- Return type:
list
- rename(body: dict[str, Any], key_parts: list[str], output_key: str) dict[str, Any]
Rename terms
- Parameters:
body (dict) – current body
key_parts (list) – key parts seperated by delimiter
- Returns:
dict
- Return type:
update body
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.regex_rename.RegexRenameInput(*, exists_key: str = '$', exists_delimiter: str = '.', regex_swaps: list[RegexOutputKey], delimiter: str = '')
Bases:
Input
Model for Regex Rename Input.
- delimiter: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'delimiter': FieldInfo(annotation=str, required=False, default='', description='delimiter for nested term.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'regex_swaps': FieldInfo(annotation=list[RegexOutputKey], required=True, description='Regex and output key combinations.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- regex_swaps: list[RegexOutputKey]
extraction_methods.plugins.regex_type_cast module
Regex Type Cast Method
- class extraction_methods.plugins.regex_type_cast.RegexCastType(*, exists_key: str = '$', exists_delimiter: str = '.', regex: str, cast_type: str)
Bases:
Input
Model for Regex Cast Type.
- cast_type: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'cast_type': FieldInfo(annotation=str, required=True, description='Python type to cast to.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'regex': FieldInfo(annotation=str, required=True, description='Regex to test against.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- regex: str
- class extraction_methods.plugins.regex_type_cast.RegexTypeCastExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
regex_type_cast
- Description:
Takes a list of regex and cast type combinations. Any existing properties that full match a regex are cast to the associated type.
Configuration Options: .. list-table:
- ``regex_casts``: Regex and cast type combinations.
Example configuration: .. code-block:: yaml
method: regex_type_cast inputs:
- regex_casts:
regex: clound_cover cast_type: int
# noqa: W605
- input_class
alias of
RegexTypeCastInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.regex_type_cast.RegexTypeCastInput(*, exists_key: str = '$', exists_delimiter: str = '.', regex_casts: list[RegexCastType])
Bases:
Input
Model for Regex Cast Type Input.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'regex_casts': FieldInfo(annotation=list[RegexCastType], required=True, description='Regex and cast type combinations.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- regex_casts: list[RegexCastType]
extraction_methods.plugins.remove module
Remove Method
- class extraction_methods.plugins.remove.RemoveExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
remove
- Description:
remove keys from body.
Configuration Options: .. list-table:
- ``keys``: ``REQUIRED`` list of keys to remove. - ``delimiter``: delimiter for nested key.
Example Configuration: .. code-block:: yaml
method: remove inputs:
keys: - hello - world
- input_class
alias of
RemoveInput
- matching_keys(keys: KeysView[str], key_regex: str) list[str]
Find all keys that match regex
- Parameters:
keys (KeysView) – dictionary keys to test
key_regex (str) – regex to test against
- Returns:
matching keys
- Return type:
list
- remove_key(body: dict[str, Any], key_parts: list[str]) dict[str, Any]
Remove nested terms
- Parameters:
body (dict) – current body
key_parts (list) – key parts seperated by delimiter
- Returns:
dict
- Return type:
update body
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.remove.RemoveInput(*, exists_key: str = '$', exists_delimiter: str = '.', keys: list[str], delimiter: str = '.')
Bases:
Input
Model for Remove Input.
- delimiter: str
- keys: list[str]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'delimiter': FieldInfo(annotation=str, required=False, default='.', description='delimiter for nested term.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'keys': FieldInfo(annotation=list[str], required=True, description='list of keys to remove.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
extraction_methods.plugins.stac_extension module
STAC Extension Method
- class extraction_methods.plugins.stac_extension.STACExtension(*, url: str, prefix: str, properties: list[str])
Bases:
BaseModel
Model for STAC Extension.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'prefix': FieldInfo(annotation=str, required=True, description='Extension prefix.'), 'properties': FieldInfo(annotation=list[str], required=True, description='Extension properties.'), 'url': FieldInfo(annotation=str, required=True, description='Extension URL.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- prefix: str
- properties: list[str]
- url: str
- class extraction_methods.plugins.stac_extension.STACExtensionExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
stac_extension
- Description:
Accepts a list of extensions which contain url, prefix and list of properties.
Configuration Options: .. list-table:
- ``extensions``: ``REQUIRED`` List of extensions.
Example Configuration: .. code-block:: yaml
method: stac_extension inputs:
- extensions:
url: hello.com/v1.0.0/world.json prefix: hello properties:
foo
bar
- input_class
alias of
STACExtensionInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.stac_extension.STACExtensionInput(*, exists_key: str = '$', exists_delimiter: str = '.', extensions: list[STACExtension])
Bases:
Input
Model for STAC Extension Input.
- extensions: list[STACExtension]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'extensions': FieldInfo(annotation=list[STACExtension], required=True, description='List of extensions.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
extraction_methods.plugins.string_template module
String Template Method
- class extraction_methods.plugins.string_template.StringTemplateExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
string_template
- Description:
Accepts a template and output_key. terms are added to the template.
Configuration Options: .. list-table:
- ``template``: ``REQUIRED`` Template to follow. - ``descructive``: True if terms should be removed after templating. - ``output_key``: ``REQUIRED`` key to output to.
Example Configuration: .. code-block:: yaml
method: string_template inputs:
template: {hello}/{goodbye}/{hello}/bonjour.html output_key: manifest_url
- input_class
alias of
StringTemplateInput
- run(body: dict[str, Any]) Any
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.string_template.StringTemplateInput(*, exists_key: str = '$', exists_delimiter: str = '.', template: str, descructive: bool = False, output_key: str)
Bases:
Input
Model for String Template Input.
- descructive: bool
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'descructive': FieldInfo(annotation=bool, required=False, default=False, description='True if terms should be removed after templating.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'output_key': FieldInfo(annotation=str, required=True, description='key to output to.'), 'template': FieldInfo(annotation=str, required=True, description='Template to follow.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- output_key: str
- template: str
extraction_methods.plugins.xml module
XML Method
- class extraction_methods.plugins.xml.XMLExtract(*args: Any, **kwargs: Any)
Bases:
ExtractionMethod
Method:
xml
- Description:
Processes XML documents to extract metadata
Configuration Options: .. list-table:
- ``input_term``: Term for method to run on. - ``template``: ``REQUIRED`` Template to follow. - ``properties``: ``REQUIRED`` List of properties to retrieve from the document. - ``namespaces``: ``REQUIRED`` Map of namespaces.
- Extraction Keys:
Extraction keys should be a map.
Name
Description
key
Key of the property. Passed to xml.etree.ElementTree.find() and also supports xpath formatted accessors
output_key
Key to output to.
attribute
Allows you to select from the element attribute. In the absence of this value, the default behaviour is to access the text value of the key. In some cases, you might want to access and attribute of the element.
Example configuration: .. code-block:: yaml
method: xml inputs:
- properties:
name: start_datetime key: ‘.//gml:beginPosition’ attribute: start
# noqa: W605
- run(body: dict[str, Any]) dict[str, Any]
Run the method.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.xml.XMLInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', properties: list[XMLProperty], namespaces: dict[str, str])
Bases:
Input
Model for XML Input.
- input_term: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='Term for method to run on.'), 'namespaces': FieldInfo(annotation=dict[str, str], required=True, description='Map of namespaces.'), 'properties': FieldInfo(annotation=list[XMLProperty], required=True, description='List of properties to retrieve from the document.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- namespaces: dict[str, str]
- properties: list[XMLProperty]
- class extraction_methods.plugins.xml.XMLProperty(*, key: str, output_key: str = '', attribute: str = '')
Bases:
KeyOutputKey
Model for XML property.
- attribute: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'attribute': FieldInfo(annotation=str, required=False, default='', description='Attribute of the XML property.'), 'key': FieldInfo(annotation=str, required=True), 'output_key': FieldInfo(annotation=str, required=False, default='')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.