Extra Details for CEDA Staff

Last Updated: 4th March 2025

The following content has been documented to help CEDA staff users specifically, and involves integration of other packages.

CCI: Fill group using Moles ESGF Tag results

For CCI projects, it can be faster and easier to initialise an empty group and the fill this group via the esgf_drs.json file created using the cci-tag-scanner package. The process for doing this is documented here. See the cci-tag-scanner repo for instructions on how to run the moles tagging script.

1. Create an empty group

This can be done interactively or using the console.

$ padocc new -G my_group

Or interactively…

>>> from padocc import GroupOperation

>>> # Access your working directory from the external environment - if already defined
>>> import os
>>> workdir = os.environ.get("WORKDIR")

>>> my_group = GroupOperation('my_group',workdir)

2. Add new projects using the moles_esgf.json contents

Either the content can be provided directly or the filepath, but in either case it must be done interactively.

>>> my_group.add_project('moles_esgf.json', moles_tags=True)
INFO [group-operation]: Rejected UNKNOWN_DRS - /neodc/esacci/fire/data/burned_area/Sentinel3_SYN/pixel/v1.1 - not all files are friendly.
INFO [group-operation]: Rejected esacci.fire.mon.l3s.ba.multi-sensor.multi-platform.syn.v1-1.pixel - not all files are friendly.
DEBUG [group-operation]: Creating file "main.txt"
DEBUG [group-operation]: Creating operator for project esacci.fire.mon.l4.ba.multi-sensor.multi-platform.syn.v1-1.grid
DEBUG [group-operation]: Constructing the config file for esacci.fire.mon.l4.ba.multi-sensor.multi-platform.syn.v1-1.grid
DEBUG [group-operation]: Creating file "base-cfg.json"
DEBUG [group-operation]: Skipped setting value in detail-cfg.json
DEBUG [group-operation]: Creating file "allfiles.txt"
DEBUG [group-operation]: Skipped setting value in status_log.csv
DEBUG [group-operation]: Skipped setting value in kj1.1a.json
DEBUG [group-operation]: No 1.3.2 related file issues.
DEBUG [group-operation]: Skipped setting value in faultlist.csv
DEBUG [group-operation]: Skipped setting value in datasets.csv
>>> my_group[0]
DEBUG [group-operation]: Creating operator for project esacci.fire.mon.l4.ba.multi-sensor.multi-platform.syn.v1-1.grid
DEBUG [group-operation]: Creating file "status_log.csv"
DEBUG [group-operation]: content length: 10
esacci.fire.mon.l4.ba.multi-sensor.multi-platform.syn.v1-1.grid:
File count: 10
Group: my_group
Phase: init
Revision: kj1.1

In the above example, the UNKNOWN_DRS option was ignored since the DRS was not issued (normally meaning non-data files like READMEs), as well as the first DRS, which contained only a set of .tar.gz files which are not processable by padocc. The third option with the drs esacci.fire.mon.l4.ba.multi-sensor.multi-platform.syn.v1-1.grid was identified as valid and all subsequent files were created.

It was then possible to identify this project as the 0th member of this group, with 10 files identified from the input source. In this way, it is possible to add many projects to this group from one moles tags file. And multiple groups can be merged, which adds further options for creating groups.

CCI: Alter dataset attributes using CCI Tagger JSONs

In many cases the CCI tagger json files contain expected default values for different datasets. Padocc has now implemented an apply_defaults method per-project which can be used to reassign values in the cloud dataset.

In contrast to the first CCI-specific case, this section must be performed via the interactive shell, and it requires opening the JSON file and passing the correct content:

>>> import json
>>> with open('fire_syn_v1.1_input.json') as f:
...     refs = json.load(f)

>>> defaults = refs['defaults']
>>> p = my_group[0]
>>> p.apply_defaults(defaults)

This will apply the default attributes to the ‘dataset’ filehandler, which is specified by the cloud_format. If you wish to apply these attributes to a specific product, use the target kwarg to specify e.g kfile, zstore. This function can also be used to remove specific values, especially if you’re using the defaults to correct a naming issue.

# Quick example of how you can extract the current value of any property from the main dataset.
>>> defaults = {'PRODUCT_VERSION':p.dataset_attributes['product_version']}
>>> p.apply_defaults(defaults, remove = ['product_version'])

This will effectively rename the product_version parameter to PRODUCT_VERSION. Also, performing the functions using the apply_defaults method will automatically update the base CFA dataset alongside the target dataset.