extraction_methods.plugins.assets.backends package
Submodules
extraction_methods.plugins.assets.backends.elasticsearch module
Elasticsearch Assets Backend
- class extraction_methods.plugins.assets.backends.elasticsearch.ElasticsearchAssets(*args: Any, **kwargs: Any)
Bases:
Backend
Method:
elasticsearch_assets
- Description:
Using an ID. Generate a summary of information for higher level entities.
Configuration Options: .. list-table:
- ``index``: Name of the index holding the STAC entities - ``id_term``: Term used for agregating the STAC entities - ``connection_kwargs``: Connection parameters passed to `elasticsearch.Elasticsearch<https://elasticsearch-py.readthedocs.io/en/7.10.0/api.html>`_ - ``bbox``: list of terms for which their aggregate bbox should be returned. - ``min``: list of terms for which the minimum of their aggregate should be returned. - ``max``: list of terms for which the maximum of their aggregate should be returned. - ``sum``: list of terms for which the sum of their aggregate should be returned. - ``list``: list of terms for which a list of their aggregage should be returned.
Configuration Example: .. code-block:: yaml
name: elasticsearch inputs:
index: ceda-index id_term: item_id client_kwargs:
hosts: [‘host1:9200’,’host2:9200’]
- fields:
roles
- input_class
alias of
ElasticsearchAssetsInput
- run(body: dict[str, Any]) Any
Run the backend.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.assets.backends.elasticsearch.ElasticsearchAssetsInput(*, exists_key: str = '$', exists_delimiter: str = '.', index: str, client_kwargs: dict[str, Any] = {}, request_timeout: int = 60, regex: str, search_field: str, href_term: str = 'path', extra_fields: list[KeyOutputKey] = [])
Bases:
Input
Model for Elasticsearch Assets Backend Input.
- client_kwargs: dict[str, Any]
- extra_fields: list[KeyOutputKey]
- href_term: str
- index: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'client_kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='Elasticsearch connection kwargs.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'extra_fields': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='term for method to output to.'), 'href_term': FieldInfo(annotation=str, required=False, default='path', description='term to use for href.'), 'index': FieldInfo(annotation=str, required=True, description='Elasticsearch index to search on.'), 'regex': FieldInfo(annotation=str, required=True, description='Regex to test against.'), 'request_timeout': FieldInfo(annotation=int, required=False, default=60, description='Request timeout for search.'), 'search_field': FieldInfo(annotation=str, required=True, description='Term to search for regex on.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- regex: str
- request_timeout: int
- search_field: str
extraction_methods.plugins.assets.backends.intake_esm module
extraction_methods.plugins.assets.backends.regex module
Regex Assets Backend
- class extraction_methods.plugins.assets.backends.regex.RegexAssets(*args: Any, **kwargs: Any)
Bases:
Backend
Method:
regex_assets
- Description:
Takes a regex glob and yields a dictionary for each matching path.
Configuration Options: .. list-table:
- ``input_term``:The regular expression to match against the path
Example configuration: .. code-block:: yaml
method: regex inputs:
input_term: ^(?:[^_]*_){2}(?P<datetime>d*)
# noqa: W605
- input_class
alias of
RegexAssetsInput
- run(body: dict[str, Any]) Any
Run the backend.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.assets.backends.regex.RegexAssetsInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri')
Bases:
Input
Model for Regex Assets Backend Input.
- input_term: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.