dunedn.preprocessing package
Submodules
dunedn.preprocessing.preprocess module
This module contains the wrapper function for the dunedn preprocess
command.
Example
Preprocess help output:
$ dunedn preprocess --help
usage: dunedn preprocess [-h] [--output OUTPUT] [--force] [--save_sample] runcard
Preprocess dataset of protoDUNE events: dumps planes and training crops.
positional arguments:
runcard the input folder
optional arguments:
-h, --help show this help message and exit
--output OUTPUT, -o OUTPUT
the output folder
--force overwrite existing files if present
--save_sample extract a smaller dataset
- dunedn.preprocessing.preprocess.add_arguments_preprocessing(parser: ArgumentParser)[source]
Adds preprocessing subparser arguments.
- Parameters
parser (ArgumentParser) – Preprocessing subparser object.
- dunedn.preprocessing.preprocess.preprocess(args: Namespace)[source]
Wrapper preprocessing function.
- Parameters
args (Namespace) – Command line parsed arguments. It should contain configcard file name, dataset directory path, plus save_sample boolean options.
- dunedn.preprocessing.preprocess.preprocess_main(dsetup: dict, save_sample: bool)[source]
Preprocessing main function.
Loads an input event from file, makes inference and saves the ouptut.
- Parameters
dsetup (dict) – The dataset setup.
save_sample (bool) –
Wether to extract a smaller dataset.
dir_name: Path, directory path to dataset
nb_crops: int, number of crops from each plane
crop_edge: int, crop edge size
pct: float, signal / background crop balance
dunedn.preprocessing.putils module
This module contains the utility functions for the preprocessing step.
- dunedn.preprocessing.putils.crop_planes_and_dump(dir_name: Path, nb_crops: int, crop_size: list[int], pct: float)[source]
Populates the
<dir_name>/cropfolder.For each plane stored in
<dir_name>/planesgeneratenb_cropsof sizecrop_sizeaccording to fixed signal to background percentage.- Parameters
dir_name (Path) – Directory path to datasets.
nb_crops (int) – Number of crops from a single plane.
crop_size (list[int]) – Crop size, (height, width).
pct (float) – Signal to background crops balancing.
- dunedn.preprocessing.putils.get_crop(clear_plane: ndarray, nb_crops: int = 1000, crop_size: list[int] = [32, 32], pct=0.5) Tuple[ndarray, ndarray][source]
Finds crops centers indeces and return crops around them.
- Parameters
clear_plane (np.ndarray) – Clear plane of shape=(H,W).
nb_crops (int) – Number of crops.
crop_size (list) – Crop [height, width].
pct (float) – Signal / background crops balancing.
- Returns
Crop indices:
row indices, of shape=(nb_crops, crop_edge, 1).
column indices, of shape=(nb_crops, 1, crop_edge).
- Return type
Tuple[np.ndarray, np.ndarray]
- dunedn.preprocessing.putils.get_planes_and_dump(dname: Path, save_sample: bool)[source]
Populates the
<dname>/planesdirectory with APA planes arrays.Planes come from events in the
<dname>/eventsdirectory. Planes arrays have shape=(N,C,H,W).- Parameters
dname (Path) – Path to train|val|test dataset subfolder.
save_sample (bool) – Wether to save a smaller dataset from the original one.
- dunedn.preprocessing.putils.save_normalization_info(dir_name: Path, channel: str)[source]
Stores on disk useful information to apply dataset normalization.
Available normalizations are MinMax | Zscore | Mednorm
- Parameters
dir_name (Path) – Directory path to datasets.
channel (str) – Induction | collection.