CFA Creator: CFANetCDF
- class cfapyx.creator.CFACreateMixin[source]
Mixin class for
Create
methods for a CFA-netCDF dataset.- _first_pass(agg_dims: list = None) tuple [source]
Perform a first pass across all provided files. Extracts the global attributes and information on all variables and dimensions into separate python dictionaries. Also collects the set of files arranged by aggregated dimension coordinates, to be used later in constructing the CFA
fragment_location
properties.
- _second_pass(var_info: dict, non_aggregated: list) dict [source]
Second pass through a subset of the files (2) to collect non-aggregated variables which will be stored in the CFA file.
- _collect_dim_info(ds, d: str, pure_dimensions: list, coord_variables: list, agg_dims: list = None, first_time: bool = False)[source]
Collect new info about each dimension. The collected attributes depend mostly on if the dimension is
pure
(i.e not a coordinate variable) or if it is a coordinate variable. Aggregated dimensions require collection of all array sequences that have a differentstart
value. If the aggregation dimensions are known, we do not have to collect arrays from each file from non-aggregated dimensions.
- _update_info(ncattr_obj, info: dict, new_info: dict) dict [source]
Update the information for a variable/dimension based on the current dataset. Certain properties are collected in lists while others are explicitly defined and should be equal across all files. Others still may differ across files, in which case the
concat_msg
is applied which usually indicates to inspect individual files for the correct value.
- _arrange_dimensions(dim_info: dict, agg_dims: list = None) dict [source]
Arrange the aggregation dimensions by ordering the start values collected from each file. Dimension arrays are aggregated to a single array once properly ordered, and the sizes fragments in each dimension are recorded in the
dim_info
.
- _assemble_location(arranged_files: dict, dim_info: dict) dict [source]
Assemble the base CFA
fragment_location
from which all the locations for different variables are derived. Locations are defined by the number of dimensions, and follow the same pattern for definition as thefragment_shapes
. The combinations of dimensions that require their ownlocation
andshape
are recorded incdim_opts
.
- _determine_non_aggregated(var_info: dict, agg_dims: list) list [source]
Determine non-aggregated variables present in the fragment files. Non-aggregated variables are equivalent to the
identical variables
from kerchunk jargon. If the non-aggregated variables are later found to vary across the fragment files, an error will be raised.
- _determine_size_opts(var_info: dict, agg_dims: list) list [source]
Determine the combinations of dimensions from the information around each variable. Each combination requires a different
location
andshape
fragment array variable in the final CFA-netCDF file.
- _accumulate_attrs(attrs: dict, ncattrs: dict) dict [source]
Accumulate attributes from the new source and the existing set. Ignore fill value attributes as these are handled elsewhere. If attributes are not equal across files, the
concat_msg
is used to indicate where data users should seek out the source files for correct values.
- class cfapyx.creator.CFAWriteMixin[source]
Mixin class for
Write
methods for a CFA-netCDF dataset.- _write_dimensions()[source]
Write the collected dimensions in dim_info as new dimensions in the CFA-netCDF file. So-called
pure
dimensions which have no variable component (no array of values) are defined with size alone, whereas coordinate dimensions (coordinate variables) have an associated variable component. The so-calledf-dims
are also created here as the fragmented size of each coordinate dimension.Note: if a coordinate dimension is not fragmented, it still has an attributed f-dim, equal to 1.
- _write_variables()[source]
Non-aggregated variables are defined exactly the same as in the fragment files, while aggregated variables contain
aggregated_data
andaggregated_dimensions
attributes, which link to the fragment array variables.
- _write_fragment_addresses()[source]
Create a
fragment_address
variable for each variable which is not dimension-less.
- _write_shape_dims(f_dims: dict)[source]
Construct the shape and location dimensions for each combination of dimensions stored in
cdim_opts
. This utilises the so-calledf-dims
previously created for each coordinate dimension.
- _write_fragment_shapes()[source]
Construct the
fragment_shape
variable part for each combination of dimensions stored incdim_opts
. This utilises theshape
dimensions previously created.
- class cfapyx.creator.CFANetCDF(files: list, concat_msg: str = 'See individual datasets for more information.')[source]
CFA-netCDF file constructor class, enables creation and writing of new CF1.12 aggregations.
- create(updates: dict = None, removals: list = None, agg_dims: list = None) None [source]
Perform the operations and passes needed to accumulate the set of variable/dimension info and attributes to then construct a CFA-netCDF file.
- write(outfile: str) None [source]
Use the accumulated dimension/variable info and attributes to construct a CFA-netCDF file.
- property agg_dims
Display the aggregated dimensions identified on creation.
- property pure_dims
Display the ‘pure’ dimensions identified on creation. Pure dimensions are defined only by a size, with no array of values.
- property coord_dims
Display the coordinate dimensions identified on creation. Coordinate dimensions include an array of values for the dimension as a variable with the same name.
- property scalar_vars
Display the scalar variables identified on creation, which are single valued and are dimensionless.
- property aggregated_vars
Display the variables that vary across the aggregation dimensions.
- property identical_vars
Display the variables that do not vary across the aggregation dimensions and must therefore be identical across all files.