Analysis¶
Analysis helper functions
Utils¶
-
mesmerize.analysis.utils.
get_array_size
(transmission: mesmerize.analysis.data_types.Transmission, data_column: str) → int[source]¶ Returns the size of the 1D arrays in the specified data column. Throws an exception if they do not match
- Parameters
transmission (Transmission) – Desired Transmission
data_column (str) – Data column of the Transmission from which to retrieve the size
- Returns
Size of the 1D arrays of the specified data column
- Return type
-
mesmerize.analysis.utils.
get_frequency_linspace
(transmission: mesmerize.analysis.data_types.Transmission) → Tuple[numpy.ndarray, float][source]¶ Get the frequency linspace.
Throwns an exception if all datablocks do not have the same linspace & Nyquist frequencies
- Parameters
transmission – Transmission containing data from which to get frequency linspace
- Returns
tuple: (frequency linspace as a 1D numpy array, nyquist frequency)
- Return type
Tuple[np.ndarray, float]
-
mesmerize.analysis.utils.
get_proportions
(xs: Union[pandas.core.series.Series, numpy.ndarray, list], ys: Union[pandas.core.series.Series, numpy.ndarray], xs_name: str = 'xs', ys_name: str = 'ys', swap: bool = False, percentages: bool = True) → pandas.core.frame.DataFrame[source]¶ Get the proportions of xs vs ys.
xs & ys are categorical data.
- Parameters
xs (Union[pd.Series, np.ndarray]) – data plotted on the x axis
ys (Union[pd.Series, np.ndarray]) – proportions of unique elements in ys are calculated per xs
xs_name (str) – name for the xs data, useful for labeling the axis in plots
ys_name (str) – name for the ys data, useful for labeling the axis in plots
swap (bool) – swap x and y
- Returns
DataFrame that can be plotted in a proportions bar graph
- Return type
pd.DataFrame
-
mesmerize.analysis.utils.
get_sampling_rate
(transmission: mesmerize.analysis.data_types.Transmission, tolerance: Optional[float] = 0.1) → float[source]¶ Returns the mean sampling rate of all data in a Transmission if it is within the specified tolerance. Otherwise throws an exception.
- Parameters
transmission (Transmission) – Transmission object of the data from which sampling rate is obtained.
tolerance (float) – Maximum tolerance (in Hertz) of sampling rate variation between different samples
- Returns
The mean sampling rate of all data in the Transmission
- Return type
-
mesmerize.analysis.utils.
organize_dataframe_columns
(columns: Iterable[str]) → Tuple[List[str], List[str], List[str]][source]¶ Organizes DataFrame columns into data column, categorical label columns, and uuid columns.
-
mesmerize.analysis.utils.
pad_arrays
(a: numpy.ndarray, method: str = 'random', output_size: Optional[int] = None, mode: str = 'minimum', constant: Optional[Any] = None) → numpy.ndarray[source]¶ Pad all the input arrays so that are of the same length. The length is determined by the largest input array. The padding value for each input array is the minimum value in that array.
Padding for each input array is either done after the array’s last index to fill up to the length of the largest input array (method ‘fill-size’) or the padding is randomly flanked to the input array (method ‘random’) for easier visualization.
- Parameters
a (np.ndarray) – 1D array where each element is a 1D array
method (str) – one of ‘fill-size’ or ‘random’, see docstring for details
output_size – not used
mode (str) – one of either ‘constant’ or ‘minimum’. If ‘minimum’ the min value of the array is used as the padding value. If ‘constant’ the values passed to the “constant” argument is used as the padding value.
constant (Any) – padding value if ‘mode’ is set to ‘constant’
- Returns
Arrays padded according to the chosen method. 2D array of shape [n_arrays, size of largest input array]
- Return type
np.ndarray
Cross correlation¶
functions¶
Helper functions. Uses tslearn.cycc
-
mesmerize.analysis.math.cross_correlation.
ncc_c
(x: numpy.ndarray, y: numpy.ndarray) → numpy.ndarray[source]¶ Must pass 1D array to both x and y
- Parameters
x – Input array [x1, x2, x3, … xn]
y – Input array [y2, y2, x3, … yn]
- Returns
Returns the normalized cross correlation function (as an array) of the two input vector arguments “x” and “y”
- Return type
np.ndarray
-
mesmerize.analysis.math.cross_correlation.
get_omega
(x: Optional[numpy.ndarray] = None, y: Optional[numpy.ndarray] = None, cc: Optional[numpy.ndarray] = None) → int[source]¶ Must pass a 1D array to either both “x” and “y” or a cross-correlation function (as an array) to “cc”
- Parameters
x – Input array [x1, x2, x3, … xn]
y – Input array [y2, y2, x3, … yn]
cc – cross-correlation function represented as an array [c1, c2, c3, … cn]
- Returns
index (x-axis position) of the global maxima of the cross-correlation function
- Return type
np.ndarray
-
mesmerize.analysis.math.cross_correlation.
get_lag
(x: Optional[numpy.ndarray] = None, y: Optional[numpy.ndarray] = None, cc: Optional[numpy.ndarray] = None) → float[source]¶ Must pass a 1D array to either both “x” and “y” or a cross-correlation function (as an array) to “cc”
- Parameters
x – Input array [x1, x2, x3, … xn]
y – Input array [y2, y2, x3, … yn]
cc – cross-correlation function represented as a array [c1, c2, c3, … cn]
- Returns
Position of the maxima of the cross-correlation function with respect to middle point of the cross-correlation function
- Return type
np.ndarray
-
mesmerize.analysis.math.cross_correlation.
get_epsilon
(x: Optional[numpy.ndarray] = None, y: Optional[numpy.ndarray] = None, cc: Optional[numpy.ndarray] = None) → float[source]¶ Must pass a 1D vector to either both “x” and “y” or a cross-correlation function to “cc”
- Parameters
x – Input array [x1, x2, x3, … xn]
y – Input array [y2, y2, x3, … yn]
cc – cross-correlation function represented as an array [c1, c2, c3, … cn]
- Returns
Magnitude of the global maxima of the cross-correlationn function
- Return type
np.ndarray
-
mesmerize.analysis.math.cross_correlation.
get_lag_matrix
(curves: Optional[numpy.ndarray] = None, ccs: Optional[numpy.ndarray] = None) → numpy.ndarray[source]¶ Get a 2D matrix of lags. Can pass either a 2D array of 1D curves or cross-correlations
- Parameters
curves – 2D array of 1D curves
ccs – 2D array of 1D cross-correlation functions represented by arrays
- Returns
2D matrix of lag values, shape is [n_curves, n_curves]
- Return type
np.ndarray
-
mesmerize.analysis.math.cross_correlation.
get_epsilon_matrix
(curves: Optional[numpy.ndarray] = None, ccs: Optional[numpy.ndarray] = None) → numpy.ndarray[source]¶ Get a 2D matrix of maximas. Can pass either a 2D array of 1D curves or cross-correlations
- Parameters
curves – 2D array of 1D curves
ccs – 2D array of 1D cross-correlation functions represented by arrays
- Returns
2D matrix of maxima values, shape is [n_curves, n_curves]
- Return type
np.ndarray
-
mesmerize.analysis.math.cross_correlation.
compute_cc_data
(curves: numpy.ndarray) → mesmerize.analysis.math.cross_correlation.CC_Data[source]¶ Compute cross-correlation data (cc functions, lag and maxima matrices)
- Parameters
curves – input curves as a 2D array, shape is [n_samples, curve_size]
- Returns
cross correlation data for the input curves as a CC_Data instance
- Return type
CC_Data¶
Data container
Warning
All arguments MUST be numpy.ndarray type for CC_Data for the save to be saveable as an hdf5 file. Set numpy.unicode
as the dtype for the curve_uuids
and labels
arrays. If the dtype is 'O'
(object) the to_hdf5() method will fail.
-
class
mesmerize.analysis.cross_correlation.
CC_Data
(input_data: Optional[numpy.ndarray] = None, ccs: Optional[numpy.ndarray] = None, lag_matrix: Optional[numpy.ndarray] = None, epsilon_matrix: Optional[numpy.ndarray] = None, curve_uuids: Optional[numpy.ndarray] = None, labels: Optional[numpy.ndarray] = None)¶ -
__init__
(input_data: Optional[numpy.ndarray] = None, ccs: Optional[numpy.ndarray] = None, lag_matrix: Optional[numpy.ndarray] = None, epsilon_matrix: Optional[numpy.ndarray] = None, curve_uuids: Optional[numpy.ndarray] = None, labels: Optional[numpy.ndarray] = None)¶ Object for organizing cross-correlation data
types must be numpy.ndarray to be compatible with hdf5
- Parameters
ccs (np.ndarray) – array of cross-correlation functions, shape: [n_curves, n_curves, func_length]
lag_matrix (np.ndarray) – the lag matrix, shape: [n_curves, n_curves]
epsilon_matrix (np.ndarray) – the maxima matrix, shape: [n_curves, n_curves]
curve_uuids (np.ndarray) – uuids (str representation) for each of the curves, length: n_curves
labels (np.ndarray) – labels for each curve, length: n_curves
-
ccs
¶ array of cross-correlation functions
-
lag_matrix
¶ lag matrix
-
epsilon_matrix
¶ maxima matrix
-
curve_uuids
¶ uuids for each curve
-
labels
¶ labels for each curve
-
get_threshold_matrix
(matrix_type: str, lag_thr: float, max_thr: float, lag_thr_abs: bool = True) → numpy.ndarray¶ Get lag or maxima matrix with thresholds applied. Values outside the threshold are set to NaN
- Parameters
matrix_type – one of ‘lag’ or ‘maxima’
lag_thr – lag threshold
max_thr – maxima threshold
lag_thr_abs – threshold with the absolute value of lag
- Returns
the requested matrix with the thresholds applied to it.
- Return type
np.ndarray
-
Clustering metrics¶
-
mesmerize.analysis.clustering_metrics.
get_centerlike
(cluster_members: numpy.ndarray, metric: Optional[Union[str, callable]] = None, dist_matrix: Optional[numpy.ndarray] = None) → Tuple[numpy.ndarray, int][source]¶ Finds the 1D time-series within a cluster that is the most centerlike
- Parameters
cluster_members – 2D numpy array in the form [n_samples, 1D time_series]
metric – Metric to use for pairwise distance calculation, simply passed to sklearn.metrics.pairwise_distances
dist_matrix – Distance matrix of the cluster members
- Returns
The cluster member which is most centerlike, and its index in the cluster_members array
-
mesmerize.analysis.clustering_metrics.
get_cluster_radius
(cluster_members: numpy.ndarray, metric: Optional[Union[str, callable]] = None, dist_matrix: Optional[numpy.ndarray] = None, centerlike_index: Optional[int] = None) → float[source]¶ Returns the cluster radius according to chosen distance metric
- Parameters
cluster_members – 2D numpy array in the form [n_samples, 1D time_series]
metric – Metric to use for pairwise distance calculation, simply passed to sklearn.metrics.pairwise_distances
dist_matrix – Distance matrix of the cluster members
centerlike_index – Index of the centerlike cluster member within the cluster_members array
- Returns
The cluster radius, average between the most centerlike member and all other members
-
mesmerize.analysis.clustering_metrics.
davies_bouldin_score
(data: numpy.ndarray, cluster_labels: numpy.ndarray, metric: Union[str, callable]) → Tuple[float, numpy.ndarray][source]¶ Adopted from sklearn.metrics.davies_bouldin_score to use any distance metric
- Parameters
data – Data that was used for clustering, [n_samples, 1D time_series]
metric – Metric to use for pairwise distance calculation, simply passed to sklearn.metrics.pairwise_distances
cluster_labels – Cluster labels
- Returns
Davies Bouldin Score using EMD