extraction_methods.plugins.assets.backends package
Submodules
extraction_methods.plugins.assets.backends.elasticsearch module
Elasticsearch Assets Backend
- class extraction_methods.plugins.assets.backends.elasticsearch.ElasticsearchAssets(extraction_method_conf: ExtractionMethodConf, *args: Any, **kwargs: Any)
Bases:
BackendMethod:
elasticsearch_assets- Description:
Using an ID. Generate a summary of information for higher level entities.
Configuration Options: .. list-table:
- ``index``: Name of the index holding the STAC entities - ``client_kwargs``: Parameters to pass to `elasticsearch.Elasticsearch<https://elasticsearch-py.readthedocs.io/en/7.10.0/api.html>`_ - ``request_timeout``: Timeout for the Elasticsearch request. - ``body``: list of terms for which their aggregate bbox should be returned. - ``id_term``: Term used for agregating the STAC entities
Configuration Example: .. code-block:: yaml
name: elasticsearch inputs:
index: ceda-index id_term: item_id client_kwargs:
hosts: [‘host1:9200’,’host2:9200’]
- fields:
roles
- input_class
alias of
ElasticsearchAssetsInput
- run(body: dict[str, Any]) Any
Run the backend.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.assets.backends.elasticsearch.ElasticsearchAssetsInput(*, exists_key: str = '$', exists_delimiter: str = '.', index: str, client_kwargs: dict[str, Any] = {}, request_timeout: int = 60, body: dict[str, Any], href_term: str = 'path')
Bases:
InputModel for Elasticsearch Assets Backend Input.
- body: dict[str, Any]
- client_kwargs: dict[str, Any]
- href_term: str
- index: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'body': FieldInfo(annotation=dict[str, Any], required=True, description='Body for Elasticsearch search request.'), 'client_kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='Elasticsearch connection kwargs.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'href_term': FieldInfo(annotation=str, required=False, default='path', description='term to use for href.'), 'index': FieldInfo(annotation=str, required=True, description='Elasticsearch index to search on.'), 'request_timeout': FieldInfo(annotation=int, required=False, default=60, description='Request timeout for search.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- request_timeout: int
extraction_methods.plugins.assets.backends.intake_esm module
Intake Assets Backend
- class extraction_methods.plugins.assets.backends.intake_esm.IntakeESMAssets(extraction_method_conf: ExtractionMethodConf, *args: Any, **kwargs: Any)
Bases:
BackendMethod:
intake_assets- Description:
Performs Search on intake catalog to provide a stream of assets for procesing. Uses an Intake catalog as a source for file objects.
Configuration Options: .. list-table:
- ``input_term``: The URI of a path or URL to an ESM collection JSON file. ``DEFAULT``: ``$uri`` - ``href_term``: The column header which contains the URI to the file object. ``DEFAULT``: ``path`` - ``catalog_kwargs``: Optional kwargs to pass to `intake.open_esm_datastore <https://intake-esm.readthedocs.io/en/latest /api.html#intake_esm.core.esm_datastore>`_ - ``search_kwargs``: Optional kwargs to pass to `esm_datastore.search <https://intake-esm.readthedocs.io/en/latest /api.html#intake_esm.core.esm_datastore.search>`_
Example Configuration: .. code-block:: yaml
method: intake_esm inputs:
href_term: url
- input_class
alias of
IntakeESMAssetsInput
- run(body: dict[str, Any]) Any
Run the backend.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.assets.backends.intake_esm.IntakeESMAssetsInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', href_term: str = 'path', datastore_kwargs: dict[str, Any] = {}, search_kwargs: dict[str, Any] = {})
Bases:
InputModel for IntakeESM Assets Backend Input.
- datastore_kwargs: dict[str, Any]
- href_term: str
- input_term: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'datastore_kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='kwargs to open datastore.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'href_term': FieldInfo(annotation=str, required=False, default='path', description='term to use for href.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'search_kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='kwargs for search.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.
- search_kwargs: dict[str, Any]
extraction_methods.plugins.assets.backends.regex module
Regex Assets Backend
- class extraction_methods.plugins.assets.backends.regex.RegexAssets(extraction_method_conf: ExtractionMethodConf, *args: Any, **kwargs: Any)
Bases:
BackendMethod:
regex_assets- Description:
Takes a regex glob and yields a dictionary for each matching path.
Configuration Options: .. list-table:
- ``input_term``:The regular expression to match against the path
Example configuration: .. code-block:: yaml
method: regex inputs:
input_term: ^(?:[^_]*_){2}(?P<datetime>d*)
# noqa: W605
- input_class
alias of
RegexAssetsInput
- run(body: dict[str, Any]) Any
Run the backend.
- Parameters:
body (dict) – current generated properties
- Returns:
updated body dict
- Return type:
dict
- class extraction_methods.plugins.assets.backends.regex.RegexAssetsInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri')
Bases:
InputModel for Regex Assets Backend Input.
- input_term: str
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.')}
Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].
This replaces Model.__fields__ from Pydantic V1.