cfapyx Terminology
Provided here is a list of the terminology used throughout cfapyx
and the equivalents in the CF conventions. In general most terms
within the CF conventions for ‘Aggregated variables’ are preserved within this package, with only some additional terms required.
Note
The page ‘cfapyx Usage and Options’ covers keyword arguments to provide to xarray.open_dataset
when using the CFA
engine.
This page is specifically for the terms and conventions used within the package, and will enable developers to understand the meaning of
terms across multiple functions and classes.
Fragments, Chunks and Partitions
cfapyx has three specific terms for dealing with portions of arrays in different contexts. Directly from the CF conventions, a Fragment File
is a
source file for an aggregated variable, therefore the term Fragment
is in relation to an array from one of these source files which constitutes part
of an aggregation.
Chunks
is in reference to the Dask Computer Chunks
provided by the user with the chunks={}
argument. If there are no chunks specified,
the Dask array defined with the Fragments
as individual objects will simply use each Fragment
as a Chunk
, so these terms are equivalent.
Alternatively if a chunk scheme is given to dask (which in most cases will not match the Fragment scheme), then additional steps must be taken to
optimise the retrieval of data.
The above figure shows a case where a Chunk scheme is provided, as well as the underlying Fragment structure. The convention within this package is to
refer to any array section as a Partition
. Both Fragments
and Chunks
are considered to be Partitions
. A Partition
can take any shape within the
given space
(see Terms in cfapyx below). Originally it was thought that the Chunk and Fragment schemes should be allowed to overlap, and a nested Dask array
could be used to handle the various shapes and positions, but it was later shown to be much simpler to match the provided chunk structure to the existing fragment
structure, so each chunk is composed of exactly one fragment.
In cfapyx, all Fragment/Chunk/Partition objects inherit from a generalised ArrayPartition
class which defines certain Lazy behaviour. For the case in the
figure above, CFA would create a Dask array from several ChunkWrapper
objects, corresponding to each (Orange) Dask chunk. These ChunkWrapper
instances
would each contain a Dask array created from several CFAPartition
objects that hold some extent
of a Fragment. This means there will be multiple
low-level objects for each Fragment
but their extents will not overlap, so all data points will be covered by exactly 1 low-level object and exactly 1
ChunkWrapper (now considered the second level).
Terms in cfapyx
fragment_size_per_dim
: The non-padded fragment sizes for each dimension (fragmented or otherwise).
fragment space
: The coordinate system that refers to individual fragments. Each coordinate i, j, k refers to the number of fragments in each of the associated dimensions. Non fragmented dimensions should take the value 0 in that dimension for all fragments. Otherwise the indices (i,j,k…) will range from 0 to the number of fragments in that corresponding dimension minus 1 (since we are starting from zero.)
_array_space
: The space taken by an array in thearray space
.
fragment_space
: The total shape of the fragment array infragment space
. (Formerly fragment_array_shape)
fragment_position(s)
: A single or list of tuple values where each value is the index of a fragment infragment space
fragment_shape(s)
: A single or list of tuple values where each value is thearray shape
of the array fragment.
array shape
: The shape of a real data array.
frag_pos/frag_shape
: The identifier for an individual fragment position or shape (see above) when iterating across all or some fragments.
nfrags_per_dim
: The total number of fragments in each dimension (1 for non-fragmented dimensions.)
fragmented_dim_indexes
: The indexes of dimensions which are fragmented (0,1,2 etc.) in axisindex space
.
fragment_info
: A dictionary of fragment metadata where each key is the coordinates of a fragment in index space and the value is a dictionary of the attributes specific to that fragment.
constructor_shape
: A tuple object representing the full shape of thefragment array variables
where in some cases (i.e all fragments having the same value) this shape can be used to expand the array into the proper shape. May not be the most efficient way of implementing this though, could instead use a get_location/address method and provide thefrag_pos
and whole location/addressfragment array variable
.