dunedn.preprocessing package

Submodules

dunedn.preprocessing.preprocess module

This module contains the wrapper function for the dunedn preprocess command.

Example

Preprocess help output:

$ dunedn preprocess --help
usage: dunedn preprocess [-h] [--output OUTPUT] [--force] [--save_sample] runcard

Preprocess dataset of protoDUNE events: dumps planes and training crops.

positional arguments:
  runcard               the input folder

optional arguments:
  -h, --help            show this help message and exit
  --output OUTPUT, -o OUTPUT
                        the output folder
  --force               overwrite existing files if present
  --save_sample         extract a smaller dataset
dunedn.preprocessing.preprocess.add_arguments_preprocessing(parser: ArgumentParser)[source]

Adds preprocessing subparser arguments.

Parameters

parser (ArgumentParser) – Preprocessing subparser object.

dunedn.preprocessing.preprocess.preprocess(args: Namespace)[source]

Wrapper preprocessing function.

Parameters

args (Namespace) – Command line parsed arguments. It should contain configcard file name, dataset directory path, plus save_sample boolean options.

dunedn.preprocessing.preprocess.preprocess_main(dsetup: dict, save_sample: bool)[source]

Preprocessing main function.

Loads an input event from file, makes inference and saves the ouptut.

Parameters
  • dsetup (dict) – The dataset setup.

  • save_sample (bool) –

    Wether to extract a smaller dataset.

    • dir_name: Path, directory path to dataset

    • nb_crops: int, number of crops from each plane

    • crop_edge: int, crop edge size

    • pct: float, signal / background crop balance

dunedn.preprocessing.putils module

This module contains the utility functions for the preprocessing step.

dunedn.preprocessing.putils.crop_planes_and_dump(dir_name: Path, nb_crops: int, crop_size: list[int], pct: float)[source]

Populates the <dir_name>/crop folder.

For each plane stored in <dir_name>/planes generate nb_crops of size crop_size according to fixed signal to background percentage.

Parameters
  • dir_name (Path) – Directory path to datasets.

  • nb_crops (int) – Number of crops from a single plane.

  • crop_size (list[int]) – Crop size, (height, width).

  • pct (float) – Signal to background crops balancing.

dunedn.preprocessing.putils.get_crop(clear_plane: ndarray, nb_crops: int = 1000, crop_size: list[int] = [32, 32], pct=0.5) Tuple[ndarray, ndarray][source]

Finds crops centers indeces and return crops around them.

Parameters
  • clear_plane (np.ndarray) – Clear plane of shape=(H,W).

  • nb_crops (int) – Number of crops.

  • crop_size (list) – Crop [height, width].

  • pct (float) – Signal / background crops balancing.

Returns

Crop indices:

  • row indices, of shape=(nb_crops, crop_edge, 1).

  • column indices, of shape=(nb_crops, 1, crop_edge).

Return type

Tuple[np.ndarray, np.ndarray]

dunedn.preprocessing.putils.get_planes_and_dump(dname: Path, save_sample: bool)[source]

Populates the <dname>/planes directory with APA planes arrays.

Planes come from events in the <dname>/events directory. Planes arrays have shape=(N,C,H,W).

Parameters
  • dname (Path) – Path to train|val|test dataset subfolder.

  • save_sample (bool) – Wether to save a smaller dataset from the original one.

dunedn.preprocessing.putils.save_normalization_info(dir_name: Path, channel: str)[source]

Stores on disk useful information to apply dataset normalization.

Available normalizations are MinMax | Zscore | Mednorm

Parameters
  • dir_name (Path) – Directory path to datasets.

  • channel (str) – Induction | collection.

Module contents