Scanner Module
- pipeline.scan.format_float(value: int, logger) str
Format byte-value with proper units
- pipeline.scan.format_seconds(seconds: int) str
Convert time in seconds to MM:SS
- pipeline.scan.get_seconds(time_allowed: str) int
Convert time in MM:SS to seconds
- pipeline.scan.perform_safe_calculations(std_vars: list, cpf: list, volms: list, nfiles: int, logger) tuple
Perform all calculations safely to mitigate errors that arise during data collation.
- Parameters:
std_vars – (list) A list of the variables collected, which should be the same across all input files.
cpf – (list) The chunks per file recorded for each input file.
volms – (list) The total data size recorded for each input file.
nfiles – (int) The total number of files for this dataset
logger – (obj) Logging object for info/debug/error messages.
- Returns:
Average values of: chunks per file (cpf), number of variables (num_vars), chunk size (avg_chunk), spatial resolution of each chunk assuming 2:1 ratio lat/lon (spatial_res), totals of NetCDF and Kerchunk estimate data amounts, number of files, total number of chunks and the addition percentage.
- pipeline.scan.safe_format(value: int, fstring: str) str
Attempt to format a string given some fstring template. - Handles issues by returning ‘’, usually when value is None initially.
- pipeline.scan.scan_config(args, logger, fh=None, logid=None, **kwargs) None
Configure scanning and access main section, ensure a few key variables are set then run scan_dataset.
- Parameters:
args – (obj) Set of command line arguments supplied by argparse.
logger – (obj) Logging object for info/debug/error messages. Will create a new logger object if not given one.
fh – (str) Path to file for logger I/O when defining new logger.
logid – (str) If creating a new logger, will need an id to distinguish this logger from other single processes (typically n of N total processes.)
- Returns:
None
- pipeline.scan.scan_dataset(args, logger) None
Main process handler for scanning phase
- pipeline.scan.scan_kerchunk(args, logger, nfiles, limiter)
Function to perform scanning with output Kerchunk format.
- pipeline.scan.scan_zarr(args, logger, nfiles, limiter)
Function to perform scanning with output Zarr format.
- pipeline.scan.summarise_json(identifier, ctype: str, logger=None, proj_dir=None) tuple
Open previously written JSON cached files and perform analysis.
- pipeline.scan.write_skip(proj_dir: str, proj_code: str, logger) None
Quick function to write a ‘skipped’ detail file.