pyBIA.data_processing ===================== .. py:module:: pyBIA.data_processing .. autoapi-nested-parse:: Created on Thu Sep 16 21:43:16 2021 @author: daniel Functions --------- .. autoapisummary:: pyBIA.data_processing.find_duplicate_features pyBIA.data_processing.crop_image pyBIA.data_processing.concat_channels pyBIA.data_processing.normalize_pixels pyBIA.data_processing.process_class pyBIA.data_processing.create_training_set pyBIA.data_processing.signed_log_transform Module Contents --------------- .. py:function:: find_duplicate_features(features, tolerance=1e-09) Check for duplicate feature columns using elementwise closeness. :param features: Feature matrix with features stored column-wise. :type features: ndarray, shape (n_samples, n_features) :param tolerance: Absolute tolerance for equality checks passed to np.isclose. Default is 1e-9. :type tolerance: float, optional :returns: Zero-based indices of columns identified as duplicates of at least one other column within the tolerance; empty if none. :rtype: set of int .. py:function:: crop_image(data, x, y, size=50, invert=False) Return a square sub-array of length `size` centered on (x, y) from a 2D image, padding with NaNs if the crop extends beyond the image bounds. :param data: Input 2D image array. :type data: ndarray :param x: Column (x) coordinate of the crop center, in the image's coordinate convention. :type x: int :param y: Row (y) coordinate of the crop center, in the image's coordinate convention. :type y: int :param size: Width/height of the output crop in pixels; must be a positive integer. Default is 50. :type size: int, optional :param invert: If True, swap the provided (x, y) before cropping (useful when inputs come from FITS-style top-left origins but the cropper assumes standard indexing). Default is False. :type invert: bool, optional :returns: Cropped array of shape (size, size); regions falling outside the input image are padded with NaNs. :rtype: ndarray .. py:function:: concat_channels(channel1, channel2, channel3=None, channel4=None, channel5=None) Concatenate up to FIVE single-band 2D images along a new last axis, producing a multi-channel tensor. :param channel1: 2D array for the first channel (H × W). Must have the same height and width as the other channels. :type channel1: ndarray :param channel2: 2D array for the second channel (H × W). Must have the same height and width as the other channels. :type channel2: ndarray :param channel3: 2D array for the third channel (H × W). Must have the same height and width as the other channels. Default is None. :type channel3: ndarray or None, optional :param channel4: 2D array for the third channel (H × W). Must have the same height and width as the other channels. Default is None. :type channel4: ndarray or None, optional :param channel5: 2D array for the third channel (H × W). Must have the same height and width as the other channels. Default is None. :type channel5: ndarray or None, optional :returns: 3D array of shape (H, W, C) where C is 2 if `channel3` is None, otherwise 3. The dtype matches the input arrays. :rtype: ndarray .. py:function:: normalize_pixels(channels, min_pixel, max_pixel, img_num_channels) Clip and min–max normalize image data per channel into [0, 1], handling 2D, 3D, and 4D inputs. :param channels: Input image data as (H, W), (N, H, W), (H, W, C), or (N, H, W, C); non-finite values are set to `min_pixel`. :type channels: ndarray :param min_pixel: Lower clip bound applied to all channels before normalization. :type min_pixel: float :param max_pixel: Upper clip bound; a scalar for single-channel data or a list of length `img_num_channels` for multi-channel data. :type max_pixel: float or list of float :param img_num_channels: Number of channels expected in the output; used to validate or reshape inputs. :type img_num_channels: int :returns: Normalized array with values in [0, 1]; shape is (N, H, W) for single-channel inputs or (N, H, W, C) for multi-channel inputs. :rtype: ndarray :raises ValueError: If `max_pixel` type is incompatible with `img_num_channels`, if shapes are inconsistent with `img_num_channels`, or if the input dimensionality is not 2D, 3D, or 4D. .. py:function:: process_class(channel, label=None, img_num_channels=1, normalize=True, min_pixel=638, max_pixel=3000) Reshape image data to (N, H, W, C) and optionally apply per-channel min–max normalization; optionally return one-hot labels. :param channel: Input images as (H, W) for one image, (N, H, W) for many images, (H, W, C) for one multi-channel image, or (N, H, W, C) for many multi-channel images. :type channel: ndarray :param label: Class label encoded as 0 or 1; if provided, a one-hot label array is returned alongside the data; Default is None. :type label: int or None, optional :param img_num_channels: Number of channels per sample used to validate and reshape the output; Default is 1. :type img_num_channels: int, optional :param normalize: If True, clip to [`min_pixel`, `max_pixel`] and scale each channel to [0, 1] using `normalize_pixels`; Default is True. :type normalize: bool, optional :param min_pixel: Lower clip bound applied when `normalize=True`, with non-finite values also set to this bound; Default is 638. :type min_pixel: float, optional :param max_pixel: Upper clip bound(s) applied when `normalize=True`, scalar for single-channel or list of length `img_num_channels` for multi-channel; Default is 3000. :type max_pixel: float or list of float, optional :returns: * *ndarray* -- Data array shaped (N, H, W, C) with values in [0, 1] if normalized, otherwise an unscaled copy. * *ndarray* -- One-hot label array shaped (N, 2) returned only when `label` is not None. .. py:function:: create_training_set(blob_data, other_data, img_num_channels=1, normalize=True, min_pixel=638, max_pixel=3000) Combine positive and negative image stacks into a single training tensor with one-hot labels. :param blob_data: Positive-class images shaped (N, H, W) or (N, H, W, C) with C = img_num_channels; these receive label 1. :type blob_data: ndarray :param other_data: Negative-class images shaped (N, H, W) or (N, H, W, C) with C = img_num_channels; these receive label 0. :type other_data: ndarray :param img_num_channels: Number of channels per sample used to validate/reshape the output; Default is 1. :type img_num_channels: int, optional :param normalize: If True, clip to [min_pixel, max_pixel] and scale per channel to [0, 1] using `normalize_pixels`; Default is True. :type normalize: bool, optional :param min_pixel: Lower clip bound applied when `normalize` is True; non-finite values are also set to this bound; Default is 638. :type min_pixel: float, optional :param max_pixel: Upper clip bound(s) applied when `normalize` is True; scalar for single-channel or list of length `img_num_channels` for multi-channel; Default is 3000. :type max_pixel: float or list of float, optional :returns: * *ndarray* -- Training images shaped (N_total, H, W, C) with C = img_num_channels; values in [0, 1] when `normalize` is True. * *ndarray* -- One-hot labels shaped (N_total, 2) with class 1 for `blob_data` and class 0 for `other_data`. .. rubric:: Notes This function is for binary classification only; for multi-class workflows, call `process_class` per class and concatenate the results. .. py:function:: signed_log_transform(x, eps=1e-12) Apply a signed base-10 logarithmic transform that preserves the sign of each value. This is useful for features spanning several orders of magnitude (e.g., Hu moments) that can be positive or negative; zeros remain zero due to the sign factor. :param x: Input value(s) to transform. :type x: array-like or scalar :param eps: Small positive constant added inside the log to avoid log(0); Default is 1e-12. :type eps: float, optional :returns: Transformed value(s) with the same shape as `x`. :rtype: ndarray or scalar