Scanner Module

pipeline.scan.format_float(value: int, logger) str

Format byte-value with proper units

pipeline.scan.format_seconds(seconds: int) str

Convert time in seconds to MM:SS

pipeline.scan.get_seconds(time_allowed: str) int

Convert time in MM:SS to seconds

pipeline.scan.perform_safe_calculations(std_vars: list, cpf: list, volms: list, nfiles: int, logger) tuple

Perform all calculations safely to mitigate errors that arise during data collation.

Parameters:
  • std_vars – (list) A list of the variables collected, which should be the same across all input files.

  • cpf – (list) The chunks per file recorded for each input file.

  • volms – (list) The total data size recorded for each input file.

  • nfiles – (int) The total number of files for this dataset

  • logger – (obj) Logging object for info/debug/error messages.

Returns:

Average values of: chunks per file (cpf), number of variables (num_vars), chunk size (avg_chunk), spatial resolution of each chunk assuming 2:1 ratio lat/lon (spatial_res), totals of NetCDF and Kerchunk estimate data amounts, number of files, total number of chunks and the addition percentage.

pipeline.scan.safe_format(value: int, fstring: str) str

Attempt to format a string given some fstring template. - Handles issues by returning ‘’, usually when value is None initially.

pipeline.scan.scan_config(args, logger, fh=None, logid=None, **kwargs) None

Configure scanning and access main section, ensure a few key variables are set then run scan_dataset.

Parameters:
  • args – (obj) Set of command line arguments supplied by argparse.

  • logger – (obj) Logging object for info/debug/error messages. Will create a new logger object if not given one.

  • fh – (str) Path to file for logger I/O when defining new logger.

  • logid – (str) If creating a new logger, will need an id to distinguish this logger from other single processes (typically n of N total processes.)

Returns:

None

pipeline.scan.scan_dataset(args, logger) None

Main process handler for scanning phase

pipeline.scan.scan_kerchunk(args, logger, nfiles, limiter)

Function to perform scanning with output Kerchunk format.

pipeline.scan.scan_zarr(args, logger, nfiles, limiter)

Function to perform scanning with output Zarr format.

pipeline.scan.summarise_json(identifier, ctype: str, logger=None, proj_dir=None) tuple

Open previously written JSON cached files and perform analysis.

pipeline.scan.write_skip(proj_dir: str, proj_code: str, logger) None

Quick function to write a ‘skipped’ detail file.