pyBIA.data_processing

Created on Thu Sep 16 21:43:16 2021

@author: daniel

Functions

find_duplicate_features(features[, tolerance])

Check for duplicate feature columns using elementwise closeness.

crop_image(data, x, y[, size, invert])

Return a square sub-array of length size centered on (x, y) from a 2D image, padding with NaNs if the crop extends beyond the image bounds.

concat_channels(channel1, channel2[, channel3, ...])

Concatenate up to FIVE single-band 2D images along a new last axis, producing a multi-channel tensor.

normalize_pixels(channels, min_pixel, max_pixel, ...)

Clip and min–max normalize image data per channel into [0, 1], handling 2D, 3D, and 4D inputs.

process_class(channel[, label, img_num_channels, ...])

Reshape image data to (N, H, W, C) and optionally apply per-channel min–max normalization; optionally return one-hot labels.

create_training_set(blob_data, other_data[, ...])

Combine positive and negative image stacks into a single training tensor with one-hot labels.

signed_log_transform(x[, eps])

Apply a signed base-10 logarithmic transform that preserves the sign of each value.

Module Contents

pyBIA.data_processing.find_duplicate_features(features, tolerance=1e-09)[source]

Check for duplicate feature columns using elementwise closeness.

Parameters:
  • features (ndarray, shape (n_samples, n_features)) – Feature matrix with features stored column-wise.

  • tolerance (float, optional) – Absolute tolerance for equality checks passed to np.isclose. Default is 1e-9.

Returns:

Zero-based indices of columns identified as duplicates of at least one other column within the tolerance; empty if none.

Return type:

set of int

pyBIA.data_processing.crop_image(data, x, y, size=50, invert=False)[source]

Return a square sub-array of length size centered on (x, y) from a 2D image, padding with NaNs if the crop extends beyond the image bounds.

Parameters:
  • data (ndarray) – Input 2D image array.

  • x (int) – Column (x) coordinate of the crop center, in the image’s coordinate convention.

  • y (int) – Row (y) coordinate of the crop center, in the image’s coordinate convention.

  • size (int, optional) – Width/height of the output crop in pixels; must be a positive integer. Default is 50.

  • invert (bool, optional) – If True, swap the provided (x, y) before cropping (useful when inputs come from FITS-style top-left origins but the cropper assumes standard indexing). Default is False.

Returns:

Cropped array of shape (size, size); regions falling outside the input image are padded with NaNs.

Return type:

ndarray

pyBIA.data_processing.concat_channels(channel1, channel2, channel3=None, channel4=None, channel5=None)[source]

Concatenate up to FIVE single-band 2D images along a new last axis, producing a multi-channel tensor.

Parameters:
  • channel1 (ndarray) – 2D array for the first channel (H × W). Must have the same height and width as the other channels.

  • channel2 (ndarray) – 2D array for the second channel (H × W). Must have the same height and width as the other channels.

  • channel3 (ndarray or None, optional) – 2D array for the third channel (H × W). Must have the same height and width as the other channels. Default is None.

  • channel4 (ndarray or None, optional) – 2D array for the third channel (H × W). Must have the same height and width as the other channels. Default is None.

  • channel5 (ndarray or None, optional) – 2D array for the third channel (H × W). Must have the same height and width as the other channels. Default is None.

Returns:

3D array of shape (H, W, C) where C is 2 if channel3 is None, otherwise 3. The dtype matches the input arrays.

Return type:

ndarray

pyBIA.data_processing.normalize_pixels(channels, min_pixel, max_pixel, img_num_channels)[source]

Clip and min–max normalize image data per channel into [0, 1], handling 2D, 3D, and 4D inputs.

Parameters:
  • channels (ndarray) – Input image data as (H, W), (N, H, W), (H, W, C), or (N, H, W, C); non-finite values are set to min_pixel.

  • min_pixel (float) – Lower clip bound applied to all channels before normalization.

  • max_pixel (float or list of float) – Upper clip bound; a scalar for single-channel data or a list of length img_num_channels for multi-channel data.

  • img_num_channels (int) – Number of channels expected in the output; used to validate or reshape inputs.

Returns:

Normalized array with values in [0, 1]; shape is (N, H, W) for single-channel inputs or (N, H, W, C) for multi-channel inputs.

Return type:

ndarray

Raises:

ValueError – If max_pixel type is incompatible with img_num_channels, if shapes are inconsistent with img_num_channels, or if the input dimensionality is not 2D, 3D, or 4D.

pyBIA.data_processing.process_class(channel, label=None, img_num_channels=1, normalize=True, min_pixel=638, max_pixel=3000)[source]

Reshape image data to (N, H, W, C) and optionally apply per-channel min–max normalization; optionally return one-hot labels.

Parameters:
  • channel (ndarray) – Input images as (H, W) for one image, (N, H, W) for many images, (H, W, C) for one multi-channel image, or (N, H, W, C) for many multi-channel images.

  • label (int or None, optional) – Class label encoded as 0 or 1; if provided, a one-hot label array is returned alongside the data; Default is None.

  • img_num_channels (int, optional) – Number of channels per sample used to validate and reshape the output; Default is 1.

  • normalize (bool, optional) – If True, clip to [min_pixel, max_pixel] and scale each channel to [0, 1] using normalize_pixels; Default is True.

  • min_pixel (float, optional) – Lower clip bound applied when normalize=True, with non-finite values also set to this bound; Default is 638.

  • max_pixel (float or list of float, optional) – Upper clip bound(s) applied when normalize=True, scalar for single-channel or list of length img_num_channels for multi-channel; Default is 3000.

Returns:

  • ndarray – Data array shaped (N, H, W, C) with values in [0, 1] if normalized, otherwise an unscaled copy.

  • ndarray – One-hot label array shaped (N, 2) returned only when label is not None.

pyBIA.data_processing.create_training_set(blob_data, other_data, img_num_channels=1, normalize=True, min_pixel=638, max_pixel=3000)[source]

Combine positive and negative image stacks into a single training tensor with one-hot labels.

Parameters:
  • blob_data (ndarray) – Positive-class images shaped (N, H, W) or (N, H, W, C) with C = img_num_channels; these receive label 1.

  • other_data (ndarray) – Negative-class images shaped (N, H, W) or (N, H, W, C) with C = img_num_channels; these receive label 0.

  • img_num_channels (int, optional) – Number of channels per sample used to validate/reshape the output; Default is 1.

  • normalize (bool, optional) – If True, clip to [min_pixel, max_pixel] and scale per channel to [0, 1] using normalize_pixels; Default is True.

  • min_pixel (float, optional) – Lower clip bound applied when normalize is True; non-finite values are also set to this bound; Default is 638.

  • max_pixel (float or list of float, optional) – Upper clip bound(s) applied when normalize is True; scalar for single-channel or list of length img_num_channels for multi-channel; Default is 3000.

Returns:

  • ndarray – Training images shaped (N_total, H, W, C) with C = img_num_channels; values in [0, 1] when normalize is True.

  • ndarray – One-hot labels shaped (N_total, 2) with class 1 for blob_data and class 0 for other_data.

Notes

This function is for binary classification only; for multi-class workflows, call process_class per class and concatenate the results.

pyBIA.data_processing.signed_log_transform(x, eps=1e-12)[source]

Apply a signed base-10 logarithmic transform that preserves the sign of each value.

This is useful for features spanning several orders of magnitude (e.g., Hu moments) that can be positive or negative; zeros remain zero due to the sign factor.

Parameters:
  • x (array-like or scalar) – Input value(s) to transform.

  • eps (float, optional) – Small positive constant added inside the log to avoid log(0); Default is 1e-12.

Returns:

Transformed value(s) with the same shape as x.

Return type:

ndarray or scalar