CFA Creator: CFANetCDF

class cfapyx.creator.CFACreateMixin[source]

Mixin class for Create methods for a CFA-netCDF dataset.

_first_pass(agg_dims: list = None) → tuple[source]: Perform a first pass across all provided files. Extracts the global attributes and information on all variables and dimensions into separate python dictionaries. Also collects the set of files arranged by aggregated dimension coordinates, to be used later in constructing the CFA fragment_location properties.

_second_pass(var_info: dict, non_aggregated: list) → dict[source]: Second pass through a subset of the files (2) to collect non-aggregated variables which will be stored in the CFA file.

_collect_dim_info(ds, d: str, pure_dimensions: list, coord_variables: list, agg_dims: list = None, first_time: bool = False)[source]: Collect new info about each dimension. The collected attributes depend mostly on if the dimension is pure (i.e not a coordinate variable) or if it is a coordinate variable. Aggregated dimensions require collection of all array sequences that have a different start value. If the aggregation dimensions are known, we do not have to collect arrays from each file from non-aggregated dimensions.

_update_info(ncattr_obj, info: dict, new_info: dict) → dict[source]: Update the information for a variable/dimension based on the current dataset. Certain properties are collected in lists while others are explicitly defined and should be equal across all files. Others still may differ across files, in which case the concat_msg is applied which usually indicates to inspect individual files for the correct value.

_arrange_dimensions(dim_info: dict, agg_dims: list = None) → dict[source]: Arrange the aggregation dimensions by ordering the start values collected from each file. Dimension arrays are aggregated to a single array once properly ordered, and the sizes fragments in each dimension are recorded in the dim_info.

_assemble_location(arranged_files: dict, dim_info: dict) → dict[source]: Assemble the base CFA fragment_location from which all the locations for different variables are derived. Locations are defined by the number of dimensions, and follow the same pattern for definition as the fragment_shapes. The combinations of dimensions that require their own location and shape are recorded in cdim_opts.

_determine_non_aggregated(var_info: dict, agg_dims: list) → list[source]: Determine non-aggregated variables present in the fragment files. Non-aggregated variables are equivalent to the identical variables from kerchunk jargon. If the non-aggregated variables are later found to vary across the fragment files, an error will be raised.

_determine_size_opts(var_info: dict, agg_dims: list) → list[source]: Determine the combinations of dimensions from the information around each variable. Each combination requires a different location and shape fragment array variable in the final CFA-netCDF file.

_accumulate_attrs(attrs: dict, ncattrs: dict) → dict[source]: Accumulate attributes from the new source and the existing set. Ignore fill value attributes as these are handled elsewhere. If attributes are not equal across files, the concat_msg is used to indicate where data users should seek out the source files for correct values.

class cfapyx.creator.CFAWriteMixin[source]

Mixin class for Write methods for a CFA-netCDF dataset.

_write_dimensions()[source]

Write the collected dimensions in dim_info as new dimensions in the CFA-netCDF file. So-called pure dimensions which have no variable component (no array of values) are defined with size alone, whereas coordinate dimensions (coordinate variables) have an associated variable component. The so-called f-dims are also created here as the fragmented size of each coordinate dimension.

Note: if a coordinate dimension is not fragmented, it still has an attributed f-dim, equal to 1.

_write_variables()[source]: Non-aggregated variables are defined exactly the same as in the fragment files, while aggregated variables contain aggregated_data and aggregated_dimensions attributes, which link to the fragment array variables.

_write_fragment_addresses()[source]: Create a fragment_address variable for each variable which is not dimension-less.

_write_shape_dims(f_dims: dict)[source]: Construct the shape and location dimensions for each combination of dimensions stored in cdim_opts. This utilises the so-called f-dims previously created for each coordinate dimension.

_write_fragment_shapes()[source]: Construct the fragment_shape variable part for each combination of dimensions stored in cdim_opts. This utilises the shape dimensions previously created.

_write_aggregated_variable(var: str, meta: dict, agg_dims: str, agg_data: str)[source]

Create the netCDF parameters required for an aggregated variable.

Note: The dimensions and variables referenced in agg_data need to have already been defined for the dataset by this point.

_write_nonagg_variable(var: str, meta: dict)[source]: Create a non-aggregated variable for the CFA-netCDF file. If this variable has some attributed data (which it should), the data is set for this variable in the new file.

class cfapyx.creator.CFANetCDF(files: list, concat_msg: str = 'See individual datasets for more information.')[source]

CFA-netCDF file constructor class, enables creation and writing of new CF1.12 aggregations.

create(updates: dict = None, removals: list = None, agg_dims: list = None) → None[source]: Perform the operations and passes needed to accumulate the set of variable/dimension info and attributes to then construct a CFA-netCDF file.

write(outfile: str) → None[source]: Use the accumulated dimension/variable info and attributes to construct a CFA-netCDF file.

display_attrs()[source]: Display the global attributes consolidated in the aggregation process.

display_variables()[source]: Display the variables and some basic properties about each.

display_dimensions()[source]: Display the dimensions and some basic properties about each.

display_variable(var)[source]: Handler for displaying information about a variable

display_dimension(dim)[source]: Handler for displaying information about a variable

_display_item(keyset)[source]: Display the information about a dimension/variable

property agg_dims: Display the aggregated dimensions identified on creation.

property pure_dims: Display the ‘pure’ dimensions identified on creation. Pure dimensions are defined only by a size, with no array of values.

property coord_dims: Display the coordinate dimensions identified on creation. Coordinate dimensions include an array of values for the dimension as a variable with the same name.

property scalar_vars: Display the scalar variables identified on creation, which are single valued and are dimensionless.

property aggregated_vars: Display the variables that vary across the aggregation dimensions.

property identical_vars: Display the variables that do not vary across the aggregation dimensions and must therefore be identical across all files.

_filter_files(files: list) → list[source]: Filter the set of files to identify the trailing dimension indicative of multiple file locations. Also identifies the length of the longest filename to be used later when storing numpy string arrays.

_call_file(file: str) → Dataset[source]: Open the file as a netcdf dataset. If there are multiple filenames provided, use the first file. Also determine the longest filename to be used to define the location parameter later.