extraction_methods.plugins.assets.backends package

Submodules

extraction_methods.plugins.assets.backends.elasticsearch module

Elasticsearch Assets Backend

class extraction_methods.plugins.assets.backends.elasticsearch.ElasticsearchAssets(*args: Any, **kwargs: Any)

Bases: Backend

Method: elasticsearch_assets

Description:: Using an ID. Generate a summary of information for higher level entities.

Configuration Options: .. list-table:

- ``index``: Name of the index holding the STAC entities
- ``id_term``: Term used for agregating the STAC entities
- ``connection_kwargs``: Connection parameters passed to
  `elasticsearch.Elasticsearch<https://elasticsearch-py.readthedocs.io/en/7.10.0/api.html>`_
- ``bbox``: list of terms for which their aggregate bbox should be returned.
- ``min``: list of terms for which the minimum of their aggregate should be returned.
- ``max``: list of terms for which the maximum of their aggregate should be returned.
- ``sum``: list of terms for which the sum of their aggregate should be returned.
- ``list``: list of terms for which a list of their aggregage should be returned.

Configuration Example: .. code-block:: yaml

name: elasticsearch inputs:

index: ceda-index id_term: item_id client_kwargs:

hosts: [‘host1:9200’,’host2:9200’]

fields:

roles

input_class: alias of ElasticsearchAssetsInput

run(body: dict[str, Any]) → Any

Run the backend.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.assets.backends.elasticsearch.ElasticsearchAssetsInput(*, exists_key: str = '$', exists_delimiter: str = '.', index: str, client_kwargs: dict[str, Any] = {}, request_timeout: int = 60, regex: str, search_field: str, href_term: str = 'path', extra_fields: list[KeyOutputKey] = [])

Bases: Input

Model for Elasticsearch Assets Backend Input.

client_kwargs: dict[str, Any]

extra_fields: list[KeyOutputKey]

href_term: str

index: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'client_kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='Elasticsearch connection kwargs.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'extra_fields': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='term for method to output to.'), 'href_term': FieldInfo(annotation=str, required=False, default='path', description='term to use for href.'), 'index': FieldInfo(annotation=str, required=True, description='Elasticsearch index to search on.'), 'regex': FieldInfo(annotation=str, required=True, description='Regex to test against.'), 'request_timeout': FieldInfo(annotation=int, required=False, default=60, description='Request timeout for search.'), 'search_field': FieldInfo(annotation=str, required=True, description='Term to search for regex on.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

regex: str

request_timeout: int

search_field: str

extraction_methods.plugins.assets.backends.intake_esm module

Intake Assets Backend

class extraction_methods.plugins.assets.backends.intake_esm.IntakeESMAssets(*args: Any, **kwargs: Any)

Bases: Backend

Method: intake_assets

Description:: Performs Search on intake catalog to provide a stream of assets for procesing. Uses an Intake catalog as a source for file objects.

Configuration Options: .. list-table:

- ``input_term``: The URI of a path or URL to an ESM collection JSON
  file. ``DEFAULT``: ``$uri``
- ``href_term``: The column header which contains the URI to the file
  object. ``DEFAULT``: ``path``
- ``catalog_kwargs``: Optional kwargs to pass to `intake.open_esm_datastore
  <https://intake-esm.readthedocs.io/en/latest
  /api.html#intake_esm.core.esm_datastore>`_
- ``search_kwargs``: Optional kwargs to pass to `esm_datastore.search
  <https://intake-esm.readthedocs.io/en/latest
  /api.html#intake_esm.core.esm_datastore.search>`_

Example Configuration: .. code-block:: yaml

method: intake_esm inputs:

href_term: url

input_class: alias of IntakeESMAssetsInput

run(body: dict[str, Any]) → Any

Run the backend.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.assets.backends.intake_esm.IntakeESMAssetsInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', href_term: str = 'path', datastore_kwargs: dict[str, Any] = {}, search_kwargs: dict[str, Any] = {})

Bases: Input

Model for IntakeESM Assets Backend Input.

datastore_kwargs: dict[str, Any]

href_term: str

input_term: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'datastore_kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='kwargs to open datastore.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'href_term': FieldInfo(annotation=str, required=False, default='path', description='term to use for href.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'search_kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='kwargs for search.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

search_kwargs: dict[str, Any]

extraction_methods.plugins.assets.backends.regex module

Regex Assets Backend

class extraction_methods.plugins.assets.backends.regex.RegexAssets(*args: Any, **kwargs: Any)

Bases: Backend

Method: regex_assets

Description:: Takes a regex glob and yields a dictionary for each matching path.

Configuration Options: .. list-table:

- ``input_term``:The regular expression to match against the path

Example configuration: .. code-block:: yaml

method: regex inputs:

input_term: ^(?:[^_]*_){2}(?P<datetime>d*)

# noqa: W605

input_class: alias of RegexAssetsInput

run(body: dict[str, Any]) → Any

Run the backend.

Parameters:: body (dict) – current generated properties
Returns:: updated body dict
Return type:: dict

class extraction_methods.plugins.assets.backends.regex.RegexAssetsInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri')

Bases: Input

Model for Regex Assets Backend Input.

input_term: str

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.')}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo].

This replaces Model.__fields__ from Pydantic V1.

extraction_methods.plugins.assets.backends package

Submodules

extraction_methods.plugins.assets.backends.elasticsearch module

Elasticsearch Assets Backend

extraction_methods.plugins.assets.backends.intake_esm module

Intake Assets Backend

extraction_methods.plugins.assets.backends.regex module

Regex Assets Backend

Module contents