CFA Creator: CFANetCDF
- class cfapyx.creator.CFACreateMixin[source]
Mixin class for
Createmethods for a CFA-netCDF dataset.- _first_pass(agg_dims: list = None) tuple[source]
Perform a first pass across all provided files. Extracts the global attributes and information on all variables and dimensions into separate python dictionaries. Also collects the set of files arranged by aggregated dimension coordinates, to be used later in constructing the CFA
fragment_locationproperties.
- _second_pass(var_info: dict, non_aggregated: list) dict[source]
Second pass through a subset of the files (2) to collect non-aggregated variables which will be stored in the CFA file.
- _collect_dim_info(ds, d: str, pure_dimensions: list, coord_variables: list, agg_dims: list = None, first_time: bool = False)[source]
Collect new info about each dimension. The collected attributes depend mostly on if the dimension is
pure(i.e not a coordinate variable) or if it is a coordinate variable. Aggregated dimensions require collection of all array sequences that have a differentstartvalue. If the aggregation dimensions are known, we do not have to collect arrays from each file from non-aggregated dimensions.
- _update_info(ncattr_obj, info: dict, new_info: dict) dict[source]
Update the information for a variable/dimension based on the current dataset. Certain properties are collected in lists while others are explicitly defined and should be equal across all files. Others still may differ across files, in which case the
concat_msgis applied which usually indicates to inspect individual files for the correct value.
- _arrange_dimensions(dim_info: dict, agg_dims: list = None) dict[source]
Arrange the aggregation dimensions by ordering the start values collected from each file. Dimension arrays are aggregated to a single array once properly ordered, and the sizes fragments in each dimension are recorded in the
dim_info.
- _assemble_location(arranged_files: dict, dim_info: dict) dict[source]
Assemble the base CFA
fragment_locationfrom which all the locations for different variables are derived. Locations are defined by the number of dimensions, and follow the same pattern for definition as thefragment_shapes. The combinations of dimensions that require their ownlocationandshapeare recorded incdim_opts.
- _determine_non_aggregated(var_info: dict, agg_dims: list) list[source]
Determine non-aggregated variables present in the fragment files. Non-aggregated variables are equivalent to the
identical variablesfrom kerchunk jargon. If the non-aggregated variables are later found to vary across the fragment files, an error will be raised.
- _determine_size_opts(var_info: dict, agg_dims: list) list[source]
Determine the combinations of dimensions from the information around each variable. Each combination requires a different
locationandshapefragment array variable in the final CFA-netCDF file.
- _accumulate_attrs(attrs: dict, ncattrs: dict) dict[source]
Accumulate attributes from the new source and the existing set. Ignore fill value attributes as these are handled elsewhere. If attributes are not equal across files, the
concat_msgis used to indicate where data users should seek out the source files for correct values.
- class cfapyx.creator.CFAWriteMixin[source]
Mixin class for
Writemethods for a CFA-netCDF dataset.- _write_dimensions()[source]
Write the collected dimensions in dim_info as new dimensions in the CFA-netCDF file. So-called
puredimensions which have no variable component (no array of values) are defined with size alone, whereas coordinate dimensions (coordinate variables) have an associated variable component. The so-calledf-dimsare also created here as the fragmented size of each coordinate dimension.Note: if a coordinate dimension is not fragmented, it still has an attributed f-dim, equal to 1.
- _write_variables()[source]
Non-aggregated variables are defined exactly the same as in the fragment files, while aggregated variables contain
aggregated_dataandaggregated_dimensionsattributes, which link to the fragment array variables.
- _write_fragment_addresses()[source]
Create a
fragment_addressvariable for each variable which is not dimension-less.
- _write_shape_dims(f_dims: dict)[source]
Construct the shape and location dimensions for each combination of dimensions stored in
cdim_opts. This utilises the so-calledf-dimspreviously created for each coordinate dimension.
- _write_fragment_shapes()[source]
Construct the
fragment_shapevariable part for each combination of dimensions stored incdim_opts. This utilises theshapedimensions previously created.
- class cfapyx.creator.CFANetCDF(files: list, concat_msg: str = 'See individual datasets for more information.')[source]
CFA-netCDF file constructor class, enables creation and writing of new CF1.12 aggregations.
- create(updates: dict = None, removals: list = None, agg_dims: list = None) None[source]
Perform the operations and passes needed to accumulate the set of variable/dimension info and attributes to then construct a CFA-netCDF file.
- write(outfile: str) None[source]
Use the accumulated dimension/variable info and attributes to construct a CFA-netCDF file.
- property agg_dims
Display the aggregated dimensions identified on creation.
- property pure_dims
Display the ‘pure’ dimensions identified on creation. Pure dimensions are defined only by a size, with no array of values.
- property coord_dims
Display the coordinate dimensions identified on creation. Coordinate dimensions include an array of values for the dimension as a variable with the same name.
- property scalar_vars
Display the scalar variables identified on creation, which are single valued and are dimensionless.
- property aggregated_vars
Display the variables that vary across the aggregation dimensions.
- property identical_vars
Display the variables that do not vary across the aggregation dimensions and must therefore be identical across all files.