extraction_methods.plugins.assets.backends package
Submodules
extraction_methods.plugins.assets.backends.elasticsearch module
Elasticsearch Assets Backend
- class extraction_methods.plugins.assets.backends.elasticsearch.ElasticsearchAssets(*args: Any, **kwargs: Any)
- Bases: - Backend- Method: - elasticsearch_assets- Description:
- Using an ID. Generate a summary of information for higher level entities. 
 - Configuration Options: .. list-table: - - ``index``: Name of the index holding the STAC entities - ``id_term``: Term used for agregating the STAC entities - ``connection_kwargs``: Connection parameters passed to `elasticsearch.Elasticsearch<https://elasticsearch-py.readthedocs.io/en/7.10.0/api.html>`_ - ``bbox``: list of terms for which their aggregate bbox should be returned. - ``min``: list of terms for which the minimum of their aggregate should be returned. - ``max``: list of terms for which the maximum of their aggregate should be returned. - ``sum``: list of terms for which the sum of their aggregate should be returned. - ``list``: list of terms for which a list of their aggregage should be returned. - Configuration Example: .. code-block:: yaml - name: elasticsearch inputs: - index: ceda-index id_term: item_id client_kwargs: - hosts: [‘host1:9200’,’host2:9200’] - fields:
- roles 
 
 
 - input_class
- alias of - ElasticsearchAssetsInput
 - run(body: dict[str, Any]) Any
- Run the backend. - Parameters:
- body (dict) – current generated properties 
- Returns:
- updated body dict 
- Return type:
- dict 
 
 
- class extraction_methods.plugins.assets.backends.elasticsearch.ElasticsearchAssetsInput(*, exists_key: str = '$', exists_delimiter: str = '.', index: str, client_kwargs: dict[str, Any] = {}, request_timeout: int = 60, regex: str, search_field: str, href_term: str = 'path', extra_fields: list[KeyOutputKey] = [])
- Bases: - Input- Model for Elasticsearch Assets Backend Input. - client_kwargs: dict[str, Any]
 - extra_fields: list[KeyOutputKey]
 - href_term: str
 - index: str
 - model_config: ClassVar[ConfigDict] = {}
- Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. 
 - model_fields: ClassVar[dict[str, FieldInfo]] = {'client_kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='Elasticsearch connection kwargs.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'extra_fields': FieldInfo(annotation=list[KeyOutputKey], required=False, default=[], description='term for method to output to.'), 'href_term': FieldInfo(annotation=str, required=False, default='path', description='term to use for href.'), 'index': FieldInfo(annotation=str, required=True, description='Elasticsearch index to search on.'), 'regex': FieldInfo(annotation=str, required=True, description='Regex to test against.'), 'request_timeout': FieldInfo(annotation=int, required=False, default=60, description='Request timeout for search.'), 'search_field': FieldInfo(annotation=str, required=True, description='Term to search for regex on.')}
- Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo]. - This replaces Model.__fields__ from Pydantic V1. 
 - regex: str
 - request_timeout: int
 - search_field: str
 
extraction_methods.plugins.assets.backends.intake_esm module
Intake Assets Backend
- class extraction_methods.plugins.assets.backends.intake_esm.IntakeESMAssets(*args: Any, **kwargs: Any)
- Bases: - Backend- Method: - intake_assets- Description:
- Performs Search on intake catalog to provide a stream of assets for procesing. Uses an Intake catalog as a source for file objects. 
 - Configuration Options: .. list-table: - - ``input_term``: The URI of a path or URL to an ESM collection JSON file. ``DEFAULT``: ``$uri`` - ``href_term``: The column header which contains the URI to the file object. ``DEFAULT``: ``path`` - ``catalog_kwargs``: Optional kwargs to pass to `intake.open_esm_datastore <https://intake-esm.readthedocs.io/en/latest /api.html#intake_esm.core.esm_datastore>`_ - ``search_kwargs``: Optional kwargs to pass to `esm_datastore.search <https://intake-esm.readthedocs.io/en/latest /api.html#intake_esm.core.esm_datastore.search>`_ - Example Configuration: .. code-block:: yaml - method: intake_esm inputs: - href_term: url 
 - input_class
- alias of - IntakeESMAssetsInput
 - run(body: dict[str, Any]) Any
- Run the backend. - Parameters:
- body (dict) – current generated properties 
- Returns:
- updated body dict 
- Return type:
- dict 
 
 
- class extraction_methods.plugins.assets.backends.intake_esm.IntakeESMAssetsInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri', href_term: str = 'path', datastore_kwargs: dict[str, Any] = {}, search_kwargs: dict[str, Any] = {})
- Bases: - Input- Model for IntakeESM Assets Backend Input. - datastore_kwargs: dict[str, Any]
 - href_term: str
 - input_term: str
 - model_config: ClassVar[ConfigDict] = {}
- Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. 
 - model_fields: ClassVar[dict[str, FieldInfo]] = {'datastore_kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='kwargs to open datastore.'), 'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'href_term': FieldInfo(annotation=str, required=False, default='path', description='term to use for href.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.'), 'search_kwargs': FieldInfo(annotation=dict[str, Any], required=False, default={}, description='kwargs for search.')}
- Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo]. - This replaces Model.__fields__ from Pydantic V1. 
 - search_kwargs: dict[str, Any]
 
extraction_methods.plugins.assets.backends.regex module
Regex Assets Backend
- class extraction_methods.plugins.assets.backends.regex.RegexAssets(*args: Any, **kwargs: Any)
- Bases: - Backend- Method: - regex_assets- Description:
- Takes a regex glob and yields a dictionary for each matching path. 
 - Configuration Options: .. list-table: - - ``input_term``:The regular expression to match against the path - Example configuration: .. code-block:: yaml - method: regex inputs: - input_term: ^(?:[^_]*_){2}(?P<datetime>d*) 
 - # noqa: W605 - input_class
- alias of - RegexAssetsInput
 - run(body: dict[str, Any]) Any
- Run the backend. - Parameters:
- body (dict) – current generated properties 
- Returns:
- updated body dict 
- Return type:
- dict 
 
 
- class extraction_methods.plugins.assets.backends.regex.RegexAssetsInput(*, exists_key: str = '$', exists_delimiter: str = '.', input_term: str = '$uri')
- Bases: - Input- Model for Regex Assets Backend Input. - input_term: str
 - model_config: ClassVar[ConfigDict] = {}
- Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. 
 - model_fields: ClassVar[dict[str, FieldInfo]] = {'exists_delimiter': FieldInfo(annotation=str, required=False, default='.', description='Delimiter for nested exists terms.'), 'exists_key': FieldInfo(annotation=str, required=False, default='$', description='Key to signify a previously extracted terms.'), 'input_term': FieldInfo(annotation=str, required=False, default='$uri', description='term for method to run on.')}
- Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo]. - This replaces Model.__fields__ from Pydantic V1.