pyBIA.data_processing
=====================

.. py:module:: pyBIA.data_processing

.. autoapi-nested-parse::

   Created on Thu Sep 16 21:43:16 2021

   @author: daniel


Functions
---------

.. autoapisummary::

   pyBIA.data_processing.find_duplicate_features
   pyBIA.data_processing.crop_image
   pyBIA.data_processing.concat_channels
   pyBIA.data_processing.normalize_pixels
   pyBIA.data_processing.process_class
   pyBIA.data_processing.create_training_set
   pyBIA.data_processing.signed_log_transform


Module Contents
---------------

.. py:function:: find_duplicate_features(features, tolerance=1e-09)

   Check for duplicate feature columns using elementwise closeness.

   :param features: Feature matrix with features stored column-wise.
   :type features: ndarray, shape (n_samples, n_features)
   :param tolerance: Absolute tolerance for equality checks passed to np.isclose. Default is 1e-9.
   :type tolerance: float, optional

   :returns: Zero-based indices of columns identified as duplicates of at least one other column within the tolerance; empty if none.
   :rtype: set of int


.. py:function:: crop_image(data, x, y, size=50, invert=False)

   Return a square sub-array of length `size` centered on (x, y) from a 2D image, padding with NaNs if the crop extends beyond the image bounds.

   :param data: Input 2D image array.
   :type data: ndarray
   :param x: Column (x) coordinate of the crop center, in the image's coordinate convention.
   :type x: int
   :param y: Row (y) coordinate of the crop center, in the image's coordinate convention.
   :type y: int
   :param size: Width/height of the output crop in pixels; must be a positive integer. Default is 50.
   :type size: int, optional
   :param invert: If True, swap the provided (x, y) before cropping (useful when inputs come from FITS-style top-left origins but the cropper assumes standard indexing). Default is False.
   :type invert: bool, optional

   :returns: Cropped array of shape (size, size); regions falling outside the input image are padded with NaNs.
   :rtype: ndarray


.. py:function:: concat_channels(channel1, channel2, channel3=None, channel4=None, channel5=None)

   Concatenate up to FIVE single-band 2D images along a new last axis, producing a multi-channel tensor.

   :param channel1: 2D array for the first channel (H × W). Must have the same height and width as the other channels.
   :type channel1: ndarray
   :param channel2: 2D array for the second channel (H × W). Must have the same height and width as the other channels.
   :type channel2: ndarray
   :param channel3: 2D array for the third channel (H × W). Must have the same height and width as the other channels. Default is None.
   :type channel3: ndarray or None, optional
   :param channel4: 2D array for the third channel (H × W). Must have the same height and width as the other channels. Default is None.
   :type channel4: ndarray or None, optional
   :param channel5: 2D array for the third channel (H × W). Must have the same height and width as the other channels. Default is None.
   :type channel5: ndarray or None, optional

   :returns: 3D array of shape (H, W, C) where C is 2 if `channel3` is None, otherwise 3. The dtype matches the input arrays.
   :rtype: ndarray


.. py:function:: normalize_pixels(channels, min_pixel, max_pixel, img_num_channels)

   Clip and min–max normalize image data per channel into [0, 1], handling 2D, 3D, and 4D inputs.

   :param channels: Input image data as (H, W), (N, H, W), (H, W, C), or (N, H, W, C); non-finite values are set to `min_pixel`.
   :type channels: ndarray
   :param min_pixel: Lower clip bound applied to all channels before normalization.
   :type min_pixel: float
   :param max_pixel: Upper clip bound; a scalar for single-channel data or a list of length `img_num_channels` for multi-channel data.
   :type max_pixel: float or list of float
   :param img_num_channels: Number of channels expected in the output; used to validate or reshape inputs.
   :type img_num_channels: int

   :returns: Normalized array with values in [0, 1]; shape is (N, H, W) for single-channel inputs or (N, H, W, C) for multi-channel inputs.
   :rtype: ndarray

   :raises ValueError: If `max_pixel` type is incompatible with `img_num_channels`, if shapes are inconsistent with `img_num_channels`,
       or if the input dimensionality is not 2D, 3D, or 4D.


.. py:function:: process_class(channel, label=None, img_num_channels=1, normalize=True, min_pixel=638, max_pixel=3000)

   Reshape image data to (N, H, W, C) and optionally apply per-channel min–max normalization; optionally return one-hot labels.

   :param channel: Input images as (H, W) for one image, (N, H, W) for many images, (H, W, C) for one multi-channel image, or (N, H, W, C) for many multi-channel images.
   :type channel: ndarray
   :param label: Class label encoded as 0 or 1; if provided, a one-hot label array is returned alongside the data; Default is None.
   :type label: int or None, optional
   :param img_num_channels: Number of channels per sample used to validate and reshape the output; Default is 1.
   :type img_num_channels: int, optional
   :param normalize: If True, clip to [`min_pixel`, `max_pixel`] and scale each channel to [0, 1] using `normalize_pixels`; Default is True.
   :type normalize: bool, optional
   :param min_pixel: Lower clip bound applied when `normalize=True`, with non-finite values also set to this bound; Default is 638.
   :type min_pixel: float, optional
   :param max_pixel: Upper clip bound(s) applied when `normalize=True`, scalar for single-channel or list of length `img_num_channels` for multi-channel; Default is 3000.
   :type max_pixel: float or list of float, optional

   :returns: * *ndarray* -- Data array shaped (N, H, W, C) with values in [0, 1] if normalized, otherwise an unscaled copy.
             * *ndarray* -- One-hot label array shaped (N, 2) returned only when `label` is not None.


.. py:function:: create_training_set(blob_data, other_data, img_num_channels=1, normalize=True, min_pixel=638, max_pixel=3000)

   Combine positive and negative image stacks into a single training tensor with one-hot labels.

   :param blob_data: Positive-class images shaped (N, H, W) or (N, H, W, C) with C = img_num_channels; these receive label 1.
   :type blob_data: ndarray
   :param other_data: Negative-class images shaped (N, H, W) or (N, H, W, C) with C = img_num_channels; these receive label 0.
   :type other_data: ndarray
   :param img_num_channels: Number of channels per sample used to validate/reshape the output; Default is 1.
   :type img_num_channels: int, optional
   :param normalize: If True, clip to [min_pixel, max_pixel] and scale per channel to [0, 1] using `normalize_pixels`; Default is True.
   :type normalize: bool, optional
   :param min_pixel: Lower clip bound applied when `normalize` is True; non-finite values are also set to this bound; Default is 638.
   :type min_pixel: float, optional
   :param max_pixel: Upper clip bound(s) applied when `normalize` is True; scalar for single-channel or list of length `img_num_channels` for multi-channel; Default is 3000.
   :type max_pixel: float or list of float, optional

   :returns: * *ndarray* -- Training images shaped (N_total, H, W, C) with C = img_num_channels; values in [0, 1] when `normalize` is True.
             * *ndarray* -- One-hot labels shaped (N_total, 2) with class 1 for `blob_data` and class 0 for `other_data`.

   .. rubric:: Notes

   This function is for binary classification only; for multi-class workflows, call `process_class` per class and concatenate the results.


.. py:function:: signed_log_transform(x, eps=1e-12)

   Apply a signed base-10 logarithmic transform that preserves the sign of each value.

   This is useful for features spanning several orders of magnitude (e.g., Hu moments)
   that can be positive or negative; zeros remain zero due to the sign factor.

   :param x: Input value(s) to transform.
   :type x: array-like or scalar
   :param eps: Small positive constant added inside the log to avoid log(0); Default is 1e-12.
   :type eps: float, optional

   :returns: Transformed value(s) with the same shape as `x`.
   :rtype: ndarray or scalar