Item Generator

This library aims to be a generic tool for generating JSON documents which are STAC-like. You should be able to generate fully STAC compliant JSON or generate content which contains all the relevant information to allow you to construct a valid STAC item.

This library works on the premise that you can build a processing chain for each of your datasets by chaining together different processors to extract the relevant information. The core facet extraction chain works on an atomic basis, where input plugins provide a single object for inspection and output a single JSON object. Item IDs are generated based on selected facets. It is then up to your downstream processing to aggregate this information together.

Datastores such as Elasticsearch can make use of upserts which will merge the JSON documents in indexing.

Read the Orientation guide as a introduction into the framework.

Installation

At present, not all the required libraries are available via package managers. To install, you’ll need to install the dependencies using the requirements.txt

$ git clone https://github.com/cedadev/item-generator
$ cd item-generator
$ pip install -r requirements.txt
$ pip install .

Configuration

Configuration takes the form a YAML formatted file.

Option	Description
`extractor`	The python import path to the extractor class. If not specified, it picks up the class installed with the entry point `asset_scanner.extractors`
`item_descriptions.root_directory`	`REQUIRED` Path to the top level directory containing your dataset specific pipelines
`inputs`	`REQUIRED` Must have at least one input plugin.
`outputs`	`REQUIRED` Must have at least one output plugin

Sample configuration

item_descriptions:
  root_directory: /etc/item-generator/item_descriptions/descriptions
inputs:
  - name: file_system
    path: /badc/faam/data/2005/b069-jan-05
outputs:
  - name: standard_out
    namespace: assets
  - name: standard_out
    namespace: facets

Configuration for the extraction pipelines is done separately. This could be stored in a different repository to manage versions and additions from multiple sources. You could then clone or download this repository and reference it using the item_descriptions.root_directory. These pipeline files are in the form of item description files. These YAML files specify the processors to use to extract your desired facets.

Note

The item-generator outputs two things: 1. An item, including facets 2. An item ID to be applied to the asset.

These are separated using the namespace argument on the output plugin.

Usage

The tool is called using the asset-scanner

usage: asset_scanner [-h] conf

Run the asset scanner as configured

positional arguments:
  conf        Path to a yaml configuration file

optional arguments:
  -h, --help  show this help message and exit

Example:

$ asset_scanner conf/conf.yml

User guide:

API

Core
- Decorators
- Configuration