RQA2#
The RQA2 module provides a modern, object-oriented interface for
Recurrence Quantification Analysis (RQA). It supersedes the procedural
functions in SMdRQA.RQA_functions and adds surrogate-data testing and
chaotic-system simulation in a single coherent package.
RQA2#
- class RQA2(data=None, normalize=True, **kwargs)[source]#
Bases:
objectComprehensive Recurrence Quantification Analysis class that handles all RQA computations, visualizations, and batch processing in an object-oriented manner.
Fixed version with proper 0-based indexing throughout.
- classmethod batch_process(input_path, output_path, group_level=False, group_level_estimates=None, **kwargs)[source]#
Analyse all
.npyfiles in input_path and write results to output_path.Two CSV files are written:
rqa_results.csv(one row per file with all RQA measures) anderror_report.csv(files that could not be processed). Recurrence plots are saved as.npyarrays alongside the CSVs.- Parameters:
input_path (str or path-like) – Directory containing
.npytime-series files.output_path (str or path-like) – Directory where outputs are written (created if absent).
group_level (bool, default
False) – IfTrue, replace per-file parameters with group averages for the parameters listed in group_level_estimates.group_level_estimates (list of {‘tau’, ‘m’, ‘eps’}, optional) – Which parameters to estimate at the group level.
**kwargs – Passed to
RQA2.__init__for each file.
- Returns:
results (list of dict) – Per-file dictionaries containing file name, RQA parameters, and measure values.
error_files (list of dict) – Per-file dictionaries for failed files, containing
'file'and'error'keys.
- compute_embedded_signal()[source]#
Build the time-delay embedding tensor.
Constructs delay vectors of the form
[x(t), x(t+τ), …, x(t+(m-1)τ)]for each valid time index t.- Returns:
ndarray, shape (N_embedded, m, D) – Delay-embedded signal where
N_embedded = N - (m-1)*τ, m is the embedding dimension, and D is the number of original signal dimensions.- Raises:
ValueError – If no data have been loaded or if the data are too short for the chosen (τ, m) combination.
- compute_embedding_dimension()[source]#
Estimate the optimal embedding dimension m via False Nearest Neighbours.
Uses the FNN criterion: the embedding dimension is the smallest m for which the fraction of false nearest neighbours drops below
self.config['bound'].- Returns:
int – Embedding dimension m ≥ 1.
- Raises:
ValueError – If no data have been loaded.
- compute_neighborhood_radius(reqrr=None)[source]#
Find the neighbourhood radius ε that achieves a target recurrence rate.
The search scans
self.config['epsdiv']evenly-spaced candidate values betweenepsminandepsmaxand returns the first ε for which |RR − reqrr| <rr_delta.- Parameters:
reqrr (float, optional) – Target recurrence rate in (0, 1). Clamped to [0.01, 0.99]. Defaults to
self.config['reqrr'].- Returns:
float – Neighbourhood radius ε > 0.
- Raises:
ValueError – If no data have been loaded.
- compute_recurrence_plot()[source]#
Build the binary recurrence plot matrix from the current parameters.
Uses the delay-embedded signal with the stored (or lazily computed) τ, m, and ε. Two delay vectors are considered recurrent when their Euclidean distance is less than ε.
- Returns:
ndarray of int, shape (N_embedded, N_embedded) – Symmetric binary matrix; 1 indicates recurrence, 0 otherwise.
- Raises:
ValueError – If no data have been loaded or if the embedding parameters yield an empty embedded signal.
- compute_rqa_measures(lmin=None)[source]#
Compute the full set of RQA measures from the recurrence plot.
- Parameters:
lmin (int, optional) – Minimum line length for DET and LAM computation. Defaults to
self.config['lmin'](typically 2).- Returns:
dict – Dictionary with the following keys:
recurrence_ratefloatFraction of recurrent points (RR).
determinismfloatFraction of recurrent points on diagonal lines ≥ lmin (DET).
laminarityfloatFraction of recurrent points on vertical lines ≥ lmin (LAM).
diagonal_entropyfloatShannon entropy of diagonal line-length distribution (L_entr).
vertical_entropyfloatShannon entropy of vertical line-length distribution (V_entr).
average_diagonal_lengthfloatMean length of diagonal lines ≥ lmin (L).
average_vertical_lengthfloatMean length of vertical lines ≥ lmin (TT – trapping time).
max_diagonal_lengthfloatMaximum diagonal line length (L_max).
max_vertical_lengthfloatMaximum vertical line length (V_max).
diagonal_modefloatMode of the diagonal line-length distribution.
vertical_modefloatMode of the vertical line-length distribution.
- compute_time_delay(method=None, mi_method=None)[source]#
Estimate the optimal time delay τ from the mutual-information curve.
- Parameters:
method ({‘default’, ‘polynomial’}, optional) – Algorithm used to locate the minimum of the MI curve.
'default'selects the first local minimum;'polynomial'fits a cross-validated polynomial and uses its first minimum. Defaults toself.config['tau_method'].mi_method ({‘histdd’, ‘avg’}, optional) – Mutual-information estimator. Defaults to
self.config['mi_method'].
- Returns:
int – Optimal time delay τ ≥ 1.
- Raises:
ValueError – If no data have been loaded.
- compute_windowed_rqa_measures(window_size, window_step=1, lmin=None)[source]#
Compute RQA measures over sliding diagonal windows of the recurrence plot.
- Parameters:
window_size (int) – Size of the square window to slide along the RP diagonal.
window_step (int, default 1) – Step size for the sliding window.
lmin (int, optional) – Minimum line length for DET and LAM computation.
- Returns:
pandas.DataFrame – Per-window RQA measures indexed by window start position.
- determinism(lmin=None)[source]#
Return the determinism (DET) measure.
- Parameters:
lmin (int, optional) – Minimum diagonal line length threshold.
- Returns:
float – DET in [0, 1].
- property embedded_signal#
Time-delayed embedded signal.
- property eps#
Neighborhood radius.
- get_summary()[source]#
Return a nested dictionary summarising data info, parameters, and (if already computed) RQA measures.
- Returns:
dict – Top-level keys:
'Data Info','Parameters', and optionally'RQA Measures'.
- laminarity(lmin=None)[source]#
Return the laminarity (LAM) measure.
- Parameters:
lmin (int, optional) – Minimum vertical line length threshold.
- Returns:
float – LAM in [0, 1].
- load_data(data, normalize=True)[source]#
Load (and optionally normalise) time-series data, resetting all caches.
- Parameters:
data (array_like) – Time-series array of shape
(N,)or(N, D).normalize (bool, default
True) – Apply z-score normalisation column-wise.
- Returns:
None
- load_results(filepath)[source]#
Restore RQA state from a pickle file previously written by
save_results().- Parameters:
filepath (str or path-like) – Path to a
.pklfile created bysave_results().- Returns:
None
- property m#
Embedding dimension.
- plot_fnn_curve(max_m=None, figsize=(10, 6), save_path=None)[source]#
Plot the false nearest-neighbours ratio as a function of embedding dimension m.
A vertical dashed line marks the automatically selected m value.
- Parameters:
max_m (int, optional) – Maximum embedding dimension to evaluate.
figsize (tuple of float, default (10, 6)) – Figure dimensions in inches.
save_path (str or path-like, optional) – Destination path for saving the figure at 300 dpi.
- Returns:
None
- plot_recurrence_plot(figsize=(8, 8), title=None, save_path=None)[source]#
Display the recurrence plot.
- Parameters:
figsize (tuple of float, default (8, 8)) – Width and height of the figure in inches.
title (str, optional) – Figure title. Defaults to
'Recurrence Plot (RR=<value>)'.save_path (str or path-like, optional) – If given, the figure is saved to this path at 300 dpi before being displayed.
- Returns:
None
- plot_rqa_measures_summary(figsize=(12, 8), save_path=None)[source]#
Display a 2×3 panel summarising the main RQA measures and parameters.
Panels: main measures (RR, DET, LAM), entropy measures, average line lengths, maximum line lengths, mode line lengths, and RQA parameters (τ, m, ε).
- Parameters:
figsize (tuple of float, default (12, 8)) – Figure dimensions in inches.
save_path (str or path-like, optional) – Destination path for saving the figure at 300 dpi.
- Returns:
None
- plot_tau_mi_curve(max_tau=None, figsize=(10, 6), save_path=None)[source]#
Plot the mutual information as a function of time delay τ.
A vertical dashed line marks the automatically selected τ value.
- Parameters:
max_tau (int, optional) – Maximum τ to evaluate. Defaults to
min(100, N/4).figsize (tuple of float, default (10, 6)) – Figure dimensions in inches.
save_path (str or path-like, optional) – Destination path for saving the figure at 300 dpi.
- Returns:
None
- plot_time_series(figsize=(12, 6), save_path=None)[source]#
Plot the original (unnormalised) time-series data.
For univariate data a single line plot is drawn; for multivariate data each of the first five dimensions is shown in a stacked subplot.
- Parameters:
figsize (tuple of float, default (12, 6)) – Figure dimensions in inches.
save_path (str or path-like, optional) – Destination path for saving the figure at 300 dpi.
- Returns:
None
- Raises:
ValueError – If no data have been loaded.
- property recurrence_plot#
Recurrence plot matrix.
- property recurrence_rate#
Recurrence rate of the RP.
- save_results(filepath)[source]#
Serialise all computed RQA state to a pickle file.
Saved fields:
data,tau,m,eps,recurrence_plot,rqa_measures,config.- Parameters:
filepath (str or path-like) – Destination file path (e.g.
'analysis.pkl').- Returns:
None
- summarize_windowed_measures(windowed_df, stats=('mean', 'median', 'mode'))[source]#
Summarize windowed RQA measures with aggregate statistics.
- Parameters:
windowed_df (pandas.DataFrame) – Output from
compute_windowed_rqa_measures().stats (tuple of {‘mean’, ‘median’, ‘mode’}, default (‘mean’,’median’,’mode’)) – Summary statistics to compute.
- Returns:
dict – Flat dictionary of summary features.
- property tau#
Time delay parameter.
RQA2_simulators#
- class RQA2_simulators(seed: int | None = None)[source]#
Bases:
objectGenerators for chaotic dynamical systems used to test RQA2 and surrogate methods.
All ODE systems are integrated with
scipy.integrate.solve_ivpusing the RK45 solver at tight tolerances (rtol=1e-9,atol=1e-12). Discrete maps are iterated directly.- Parameters:
seed (int or None, optional) – Seed for the internal
numpy.random.Generator. Use an integer for reproducible results.
Examples
>>> sim = RQA2_simulators(seed=42) >>> x, y, z = sim.rossler(n=1000) >>> battery = sim.generate_test_battery()
- chua(tmax: int = 10000, n: int = 2000, Xi: Tuple[float, float, float] = (0.1, 0.1, 0.1), alpha: float = 15.6, beta: float = 28.0, m0: float = -1.143, m1: float = -0.714, dt: float = 0.01) Tuple[ndarray, ndarray, ndarray][source]#
Simulate Chua’s circuit attractor.
Uses the piecewise-linear Chua diode characteristic h(x) and the three-dimensional ODE:
dx/dt = α·(y - x - h(x)) dy/dt = x - y + z dz/dt = -β·y
- Parameters:
tmax (int, default 10000) – Number of integration steps.
n (int, default 2000) – Number of output points.
Xi (tuple of float, default (0.1, 0.1, 0.1)) – Initial conditions (x₀, y₀, z₀).
alpha, beta (float) – Circuit parameters.
m0, m1 (float) – Slopes of the piecewise-linear Chua diode.
dt (float, default 0.01) – Integration step size.
- Returns:
x, y, z (ndarray, shape (n,)) – State variables at n equidistant times.
- generate_test_battery() dict[source]#
Generate a standard battery of chaotic and periodic test signals.
- Returns:
dict – Keys:
'rossler_chaotic','rossler_sync','lorenz','henon','chua'. Each value is a dict with keys'x','y', (optionally'z'), and'regime'.
- henon(n: int = 2000, Xi: Tuple[float, float] = (0.1, 0.1), a: float = 1.4, b: float = 0.3) Tuple[ndarray, ndarray][source]#
Simulate the Hénon map.
Defined by the 2-D discrete-time recurrence:
x_{n+1} = 1 - a·x_n² + y_n y_{n+1} = b·x_nThe classic chaotic attractor uses
a=1.4, b=0.3.- Parameters:
n (int, default 2000) – Number of iterations.
Xi (tuple of float, default (0.1, 0.1)) – Initial conditions (x₀, y₀).
a, b (float) – Map parameters.
- Returns:
X, Y (ndarray, shape (n,)) – Trajectory of the x and y coordinates.
- lorenz(tmax: int = 10000, n: int = 2000, Xi: Tuple[float, float, float] = (1.0, 1.0, 1.0), sigma: float = 10.0, rho: float = 28.0, beta: float = 2.6666666666666665, dt: float = 0.01) Tuple[ndarray, ndarray, ndarray][source]#
Simulate the Lorenz attractor.
The system is defined by:
dx/dt = σ·(y - x) dy/dt = ρ·x - y - x·z dz/dt = x·y - β·z
The classic chaotic butterfly attractor uses
σ=10, ρ=28, β=8/3.- Parameters:
tmax (int, default 10000) – Number of integration steps.
n (int, default 2000) – Number of output points.
Xi (tuple of float, default (1.0, 1.0, 1.0)) – Initial conditions (x₀, y₀, z₀).
sigma, rho, beta (float) – System parameters.
dt (float, default 0.01) – Integration step size.
- Returns:
x, y, z (ndarray, shape (n,)) – State variables at n equidistant times.
- rossler(tmax: int = 10000, n: int = 2000, Xi: Tuple[float, float, float] = (1.0, 1.0, 1.0), a: float = 0.2, b: float = 0.2, c: float = 5.7, dt: float = 0.01) Tuple[ndarray, ndarray, ndarray][source]#
Simulate the Rössler attractor.
The system is defined by:
dx/dt = -y - z dy/dt = x + a·y dz/dt = b + z·(x - c)
With
b=0.2, c=5.7: chaotic fora ≲ 0.2, periodic fora ≳ 0.2.- Parameters:
tmax (int, default 10000) – Number of integration steps.
n (int, default 2000) – Number of output points (sub-sampled from the solution).
Xi (tuple of float, default (1.0, 1.0, 1.0)) – Initial conditions (x₀, y₀, z₀).
a, b, c (float) – System parameters. Classic chaotic:
a=0.2, b=0.2, c=5.7.dt (float, default 0.01) – Integration step size.
- Returns:
x, y, z (ndarray, shape (n,)) – State variables sampled at n equidistant times.
- seed: int | None = None#
RQA2_tests#
- class RQA2_tests(signal: ndarray, seed: int | None = None, max_workers: int = 1)[source]#
Bases:
objectSurrogate-data generation and statistical validation for nonlinear dynamics testing.
Implements six surrogate algorithms and a comprehensive validation framework that tests each algorithm against multiple nonlinear metrics (Lyapunov exponent, sample entropy, correlation dimension, etc.).
- Parameters:
signal (ndarray of float) – 1-D floating-point time series to be tested.
seed (int or None, optional) – Seed for reproducible surrogate generation.
max_workers (int, default 1) – Number of parallel workers for surrogate generation (parallelism kicks in when
n_surrogates >= 50).
- Raises:
TypeError – If signal is not a floating-point array.
Examples
>>> import numpy as np >>> rng = np.random.default_rng(0) >>> signal = rng.standard_normal(512).astype(float) >>> tester = RQA2_tests(signal, seed=42) >>> surrogates = tester.generate('IAAFT', n_surrogates=100) >>> surrogates.shape (100, 512)
- comprehensive_validation(systems_data: Dict[str, Dict[str, ndarray]], n_surrogates: int = 200, save_path: str = 'surrogate_validation_results.png') Dict[str, Dict[str, float]][source]#
Run all six surrogate algorithms against multiple dynamical systems and return rank-based two-sided p-values for six nonlinear metrics.
A heatmap of the results is saved to save_path.
- Parameters:
systems_data (dict) – Mapping from system name (str) to a dict with at least an
'x'key containing a 1-D ndarray (the x-coordinate of the attractor). Typically produced byRQA2_simulators.generate_test_battery().n_surrogates (int, default 200) – Number of surrogates per algorithm per system.
save_path (str, default ‘surrogate_validation_results.png’) – Path where the validation heatmap is saved.
- Returns:
dict – Nested dict:
results[system_name][surrogate_method][metric]→ p-value (float, ornanif the metric could not be computed).
- generate(kind: Literal['FT', 'AAFT', 'IAAFT', 'IDFS', 'WIAAFT', 'PPS'] = 'FT', *, n_surrogates: int = 200, **kwargs) ndarray[source]#
Generate an ensemble of surrogate time series.
Each surrogate is produced with an independent random seed to ensure statistical independence. When
n_surrogates >= 50andmax_workers > 1the ensemble is generated in parallel.- Parameters:
kind ({‘FT’, ‘AAFT’, ‘IAAFT’, ‘IDFS’, ‘WIAAFT’, ‘PPS’}, default ‘FT’) – Surrogate algorithm:
FT – Fourier-Transform phase randomisation.
AAFT – Amplitude-Adjusted Fourier Transform.
IAAFT – Iterative AAFT (n_iter=100 by default).
IDFS – Iterative Digitally-Filtered Shuffled.
WIAAFT – Wavelet-based IAAFT.
PPS – Pseudo-Periodic Surrogate.
n_surrogates (int, default 200) – Number of surrogates to generate.
**kwargs – Passed to the underlying surrogate algorithm (e.g.
n_iter=200for IAAFT,wavelet='db4'for WIAAFT).
- Returns:
ndarray, shape (n_surrogates, N) – Array of surrogate time series, one per row.
- Raises:
KeyError – If kind is not a recognised algorithm name.
- max_workers: int = 1#
- seed: int | None = None#
- signal: ndarray#
RQA2_ml#
- class RQA2_ml(rqa_kwargs: Dict[str, Any] | None = None)[source]#
Bases:
objectMachine learning utilities built on top of RQA2 features.
This class provides a complete feature-engineering and benchmarking pipeline for time-series classification and clustering using Recurrence Quantification Analysis measures. It implements the nested cross-validation with best-subset feature selection procedure described in the SMdRQA paper, surrogate-based null baselines, permutation feature importance, and publication-ready visualisations.
- Parameters:
rqa_kwargs (dict, optional) – Default keyword arguments forwarded to
RQA2when constructing RQA objects internally (e.g.normalize,mi_method). Per-call overrides can be passed viarqa_kwargsinbuild_feature_table().
Examples
>>> ml = RQA2_ml() >>> features = ml.build_feature_table( ... [signal_a, signal_b], labels=["a", "b"], ... window_size=100, window_step=20, ... rqa_kwargs={"tau": 2, "m": 3, "eps": 0.3}, ... ) >>> results = ml.nested_cv_benchmark( ... features.drop(columns=["id", "label"]), ... features["label"], model="knn", ... )
- build_feature_table(signals_or_dir, labels=None, *, window_size, window_step=1, window_stats=('mean', 'median', 'mode'), include_params=True, group_level_params=None, rqa_kwargs=None)[source]#
Build a feature table from RQA2 measures and windowed summaries.
For each signal the method computes:
Whole-signal RQA measures (recurrence rate, determinism, etc.).
Windowed RQA measures aggregated with the requested summary statistics (mean, median, mode). These columns are prefixed with
win_to avoid collisions.Optionally the embedding parameters
tau,m,eps.
- Parameters:
signals_or_dir (str, PathLike, ndarray, list, or tuple) – Input signals — see
_load_signals().labels (array-like, optional) – Class labels aligned with the signals. Required for supervised benchmarking later.
window_size (int) – Size of the square sliding window on the recurrence plot.
window_step (int, default 1) – Step size for the sliding window.
window_stats (tuple of str, default
('mean', 'median', 'mode')) – Aggregate statistics computed over windowed measures.include_params (bool, default True) – Whether to include
tau,m,epsas feature columns.group_level_params (set or list of str, optional) – Subset of
{'tau', 'm', 'eps'}to estimate once across all signals and then apply uniformly (useful for fair cross-system comparisons).rqa_kwargs (dict, optional) – Per-call overrides merged on top of
self.rqa_kwargs. Keystau,m,epsare extracted and applied as manual parameter overrides rather than passed to the RQA2 constructor.
- Returns:
pandas.DataFrame – One row per signal with columns:
id, (label), whole-signal measures,win_*windowed summaries, and optionallytau,m,eps.
- cluster_stability(X, *, method='kmeans', n_clusters=2, n_bootstrap=100, subsample_fraction=0.8, scaler=True, random_state=42)[source]#
Assess cluster stability via bootstrap resampling.
For each bootstrap iteration a random subsample is drawn, the clustering model is fit on the subsample, and labels are predicted for the full dataset. Pairwise adjusted Rand indices between bootstrap label arrays measure how stable the clustering is.
- Parameters:
X (array-like of shape (n_samples, n_features))
method (str, default
'kmeans') –'kmeans'or'gmm'. Agglomerative clustering has nopredictmethod and is not supported here.n_clusters (int, default 2)
n_bootstrap (int, default 100)
subsample_fraction (float, default 0.8)
scaler (bool, default True)
random_state (int, default 42)
- Returns:
dict –
'mean_ari': float'std_ari': float'ari_scores': ndarray — pairwise ARI values.'n_bootstrap': int'method': str'n_clusters': int
- static compare_scores(scores_a, scores_b, *, alternative='two-sided')[source]#
Wilcoxon signed-rank test with rank-biserial effect size.
- Parameters:
scores_a, scores_b (array-like) – Paired score arrays of equal length (e.g. from
nested_cv_benchmark()andsurrogate_baseline()).alternative (str, default
'two-sided') –'two-sided','greater', or'less'.
- Returns:
dict –
'statistic': float — Wilcoxon W statistic.'p_value': float'effect_size': float — rank-biserial r.'n': int — number of paired observations.
- static feature_importance(model, X, y, *, n_repeats=10, random_state=42)[source]#
Permutation feature importance on a fitted model.
- Parameters:
model (fitted sklearn estimator) – E.g. the
best_modelreturned bysupervised_benchmark().X (array-like of shape (n_samples, n_features))
y (array-like of shape (n_samples,))
n_repeats (int, default 10)
random_state (int, default 42)
- Returns:
pandas.DataFrame – Columns:
feature,importance_mean,importance_std, sorted by descending importance.
- nested_cv_benchmark(X, y, *, model='knn', outer_iterations=100, test_fraction=0.3333333333333333, inner_splits=2, inner_iterations=10, feature_selection='auto', max_subset_size=None, scaler=True, random_state=42)[source]#
Nested cross-validation with best-subset feature selection.
Implements the validation procedure described in the SMdRQA paper: the outer loop evaluates generalisation performance on held-out data, while the inner loop performs feature selection exclusively on the training fold to prevent data leakage.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Feature matrix (e.g. from
build_feature_table()).y (array-like of shape (n_samples,)) – Target labels. Must contain at least two unique classes.
model (str, default
'knn') – One of'knn','svm','rf'.outer_iterations (int, default 100) – Number of random stratified train/test splits.
test_fraction (float, default 1/3) – Fraction of data held out as the test set in each outer iteration.
inner_splits (int, default 2) – Number of folds in the inner stratified CV used for feature selection.
inner_iterations (int, default 10) – Number of repeats of the inner CV.
feature_selection ({
'auto','exhaustive','forward',) –None}, default'auto'Feature selection strategy.'auto'uses exhaustive search when the number of features is ≤ 12, otherwise forward sequential selection.Nonedisables feature selection.max_subset_size (int, optional) – Maximum number of features to select. Defaults to all.
scaler (bool, default True) – Whether to prepend a
StandardScaler.random_state (int, default 42) – Seed for reproducibility.
- Returns:
dict –
'accuracy'ndarray of shape (outer_iterations,)Test-set accuracy per outer iteration.
'roc_auc'ndarray of shape (outer_iterations,)Test-set ROC AUC per outer iteration.
'selected_features'list of tupleFeature indices selected in each outer iteration.
'feature_names'list of str or NoneColumn names if X is a DataFrame.
'feature_frequency'SeriesHow often each feature was selected across iterations.
'model'strThe model name used.
- static plot_benchmark_results(results, *, baseline=None, save_path=None, title=None)[source]#
Box plots of nested CV score distributions.
- Parameters:
results (dict) – Output of
nested_cv_benchmark().baseline (dict, optional) – Output of
surrogate_baseline(). If given, null distributions are overlaid as grey box plots.save_path (str, optional)
title (str, optional)
- Returns:
matplotlib.figure.Figure
- static plot_cluster_scatter(X, labels, *, method='pca', save_path=None, title=None)[source]#
2-D scatter plot coloured by cluster/class labels.
Dimensionality is reduced to two components via PCA before plotting.
- Parameters:
X (array-like of shape (n_samples, n_features))
labels (array-like of shape (n_samples,))
method (str, default
'pca') – Currently only'pca'is supported.save_path (str, optional)
title (str, optional)
- Returns:
matplotlib.figure.Figure
- static plot_cluster_validity(results_df, *, save_path=None, title=None)[source]#
Line plots of clustering validity indices vs k.
- Parameters:
results_df (pandas.DataFrame) – Output of
unsupervised_benchmark().save_path (str, optional)
title (str, optional)
- Returns:
matplotlib.figure.Figure
- static plot_confusion_matrix(y_true, y_pred, *, labels=None, save_path=None, title=None)[source]#
Annotated confusion-matrix heatmap.
- Parameters:
y_true, y_pred (array-like)
labels (list of str, optional)
save_path (str, optional)
title (str, optional)
- Returns:
matplotlib.figure.Figure
- static plot_feature_importance(importance_df, *, top_n=15, save_path=None, title=None)[source]#
Horizontal bar chart of feature importance.
- Parameters:
importance_df (pandas.DataFrame) – Output of
feature_importance().top_n (int, default 15)
save_path (str, optional)
title (str, optional)
- Returns:
matplotlib.figure.Figure
- static plot_surrogate_cluster_validation(results, *, metric='silhouette', save_path=None, title=None)[source]#
Visualise surrogate null distributions for clustering validity.
For each surrogate type and clustering method, shows the null distribution of the selected validity metric as a violin plot with the real-data value overlaid.
- Parameters:
results (dict) – Output of
surrogate_cluster_validation().metric (str, default
'silhouette') – Validity metric to plot ('silhouette','calinski_harabasz', or'davies_bouldin').save_path (str, optional)
title (str, optional)
- Returns:
matplotlib.figure.Figure
- static plot_surrogate_null_results(results, *, save_path=None, title=None)[source]#
Multi-panel visualisation of surrogate null benchmarking.
Displays the real classification performance against the null distribution from each surrogate type (and optionally the label-permutation null). Each panel shows violin plots of the null accuracy distributions with the real mean accuracy overlaid as a horizontal line, annotated with the FDR-corrected p-value.
- Parameters:
results (dict) – Output of
surrogate_null_benchmark().save_path (str, optional)
title (str, optional)
- Returns:
matplotlib.figure.Figure
- supervised_benchmark(X, y, *, models=('knn', 'svm', 'rf'), cv=5, scaler=True, random_state=42)[source]#
Quick supervised benchmark with stratified cross-validation.
This is a lighter alternative to
nested_cv_benchmark()for exploratory analysis. It does not perform feature selection; for paper-quality validation usenested_cv_benchmark()instead.After cross-validation the best-performing model (by accuracy, then macro-F1 as tiebreaker) is refit on the full dataset and returned alongside the per-model results table.
- Parameters:
X (array-like of shape (n_samples, n_features))
y (array-like of shape (n_samples,))
models (tuple of str, default
('knn', 'svm', 'rf'))cv (int, default 5)
scaler (bool, default True)
random_state (int, default 42)
- Returns:
results_df (pandas.DataFrame) – Columns:
model,accuracy_mean,accuracy_std,f1_macro_mean,f1_macro_std,roc_auc_mean,roc_auc_std.best_model (sklearn estimator) – Best classifier refit on all of X and y.
- surrogate_baseline(X, y, *, n_permutations=100, model='knn', cv=5, scaler=True, random_state=42)[source]#
Null performance distribution by permuting labels.
Shuffles y
n_permutationstimes and evaluates each with stratified k-fold CV, yielding a null distribution against which real performance can be compared viacompare_scores().- Parameters:
X (array-like of shape (n_samples, n_features))
y (array-like of shape (n_samples,))
n_permutations (int, default 100)
model (str, default
'knn')cv (int, default 5)
scaler (bool, default True)
random_state (int, default 42)
- Returns:
dict –
'null_accuracy': ndarray of shape (n_permutations,)'null_roc_auc': ndarray of shape (n_permutations,)
- surrogate_cluster_validation(signals, *, window_size, window_step=1, window_stats=('mean', 'median', 'mode'), include_params=True, group_level_params=None, rqa_kwargs=None, surrogate_kinds=('FT', 'AAFT', 'IAAFT'), n_surrogate_iterations=20, surrogate_kwargs=None, methods=('kmeans', 'gmm', 'agglo'), n_clusters=None, k_range=(2, 6), scaler=True, random_state=42, alpha=0.05, correction='fdr_bh', verbose=True)[source]#
Surrogate-based null hypothesis testing for clustering.
Evaluates whether the clustering structure found in RQA features exceeds what would be expected from data whose nonlinear dynamics have been destroyed. For each surrogate algorithm, surrogate time series are generated, RQA features are extracted, and clustering validity indices are computed. The resulting null distributions are compared against the real-data validity indices via rank-based p-values with multiple comparison correction.
- Parameters:
signals (list of ndarray) – Raw time-series signals.
window_size (int) – Forwarded to
build_feature_table().window_step (int, default 1)
window_stats (tuple of str, default
('mean', 'median', 'mode'))include_params (bool, default True)
group_level_params (set, optional)
rqa_kwargs (dict, optional)
surrogate_kinds (tuple of str, default
('FT', 'AAFT', 'IAAFT'))n_surrogate_iterations (int, default 20)
surrogate_kwargs (dict, optional)
methods (tuple of str, default
('kmeans', 'gmm', 'agglo'))n_clusters (int, optional)
k_range (tuple, default
(2, 6))scaler (bool, default True)
random_state (int, default 42)
alpha (float, default 0.05)
correction (str, default
'fdr_bh')verbose (bool, default True)
- Returns:
dict –
'real': dict — validity indices from real data (best silhouette per method).'surrogates': dict of {str: dict} — per-surrogate-kind null distributions and p-values for each validity index.'corrected_p_values': dict — FDR-corrected p-values.'summary': DataFrame — one-row-per-surrogate summary.
- surrogate_null_benchmark(signals, labels, *, window_size, window_step=1, window_stats=('mean', 'median', 'mode'), include_params=True, group_level_params=None, rqa_kwargs=None, surrogate_kinds=('FT', 'AAFT', 'IAAFT'), n_surrogate_iterations=20, surrogate_kwargs=None, model='knn', outer_iterations=100, surrogate_outer_iterations=30, test_fraction=0.3333333333333333, inner_splits=2, inner_iterations=10, feature_selection='auto', max_subset_size=None, scaler=True, random_state=42, alpha=0.05, correction='fdr_bh', include_permutation=True, n_permutations=100, verbose=True)[source]#
Surrogate-based null hypothesis testing for classification.
For each surrogate algorithm the method replaces every original time series with a surrogate, re-extracts RQA features, and runs the full nested cross-validation procedure. Repeating this
n_surrogate_iterationstimes yields a null distribution of classification accuracy that would be obtained from data whose nonlinear structure has been destroyed according to the specific null hypothesis encoded by the surrogate type:FT — preserves power spectrum (null: linear autocorrelation alone explains classification).
AAFT — preserves amplitude distribution + spectrum (null: linear structure + marginal distribution suffice).
IAAFT — tighter amplitude + spectrum match.
WIAAFT — preserves time-frequency structure.
PPS — preserves periodic structure (null: periodicity alone suffices).
Optionally a label-permutation baseline (the classical approach) is computed alongside for comparison. All p-values are corrected for multiple comparisons using Benjamini-Hochberg FDR.
- Parameters:
signals (list of ndarray) – Raw time-series signals (same order as labels).
labels (array-like) – Class labels aligned with signals.
window_size (int) – Forwarded to
build_feature_table().window_step (int, default 1)
window_stats (tuple of str, default
('mean', 'median', 'mode'))include_params (bool, default True)
group_level_params (set, optional)
rqa_kwargs (dict, optional)
surrogate_kinds (tuple of str, default
('FT', 'AAFT', 'IAAFT')) – Surrogate algorithms to test.n_surrogate_iterations (int, default 20) – How many surrogate datasets to generate per algorithm.
surrogate_kwargs (dict, optional) – Extra keyword arguments for surrogate generation.
model (str, default
'knn')outer_iterations (int, default 100) – Outer CV iterations for the real data.
surrogate_outer_iterations (int, default 30) – Outer CV iterations for each surrogate dataset (lower for computational tractability).
test_fraction (float, default 1/3)
inner_splits (int, default 2)
inner_iterations (int, default 10)
feature_selection (str or None, default
'auto')max_subset_size (int, optional)
scaler (bool, default True)
random_state (int, default 42)
alpha (float, default 0.05) – Significance level for FDR correction.
correction (str, default
'fdr_bh') –'fdr_bh'for Benjamini-Hochberg;'bonferroni'for Bonferroni correction.include_permutation (bool, default True) – Whether to also compute a label-permutation null.
n_permutations (int, default 100)
verbose (bool, default True) – Show progress bars.
- Returns:
dict –
'real': dict — output ofnested_cv_benchmark()on the real data.'surrogates': dict of {str: dict} — per-surrogate-kind results containing'null_accuracies','null_roc_aucs','p_value_accuracy','p_value_roc_auc','effect_size'.'permutation': dict orNone— label-permutation null (if include_permutation is True).'corrected_p_values': dict — FDR-corrected p-values per surrogate kind and metric.'summary': DataFrame — one-row-per-null summary table.
- unsupervised_benchmark(X, *, y_true=None, methods=('kmeans', 'gmm', 'agglo'), n_clusters=None, k_range=(2, 6), scaler=True, random_state=42)[source]#
Evaluate clustering with multiple validity indices.
Reports silhouette, Calinski–Harabasz, and Davies–Bouldin scores for each method × k combination. When ground-truth labels y_true are provided, the adjusted Rand index is also computed.
- Parameters:
X (array-like of shape (n_samples, n_features))
y_true (array-like, optional) – Ground-truth labels for computing adjusted Rand index.
methods (tuple of str, default
('kmeans', 'gmm', 'agglo'))n_clusters (int, optional) – Fixed cluster count. Overrides k_range.
k_range (tuple of (int, int), default
(2, 6))scaler (bool, default True)
random_state (int, default 42)
- Returns:
results_df (pandas.DataFrame) – One row per method × k with columns:
method,n_clusters,silhouette,calinski_harabasz,davies_bouldin, and optionallyadjusted_rand.labels_out (dict of {str: ndarray}) – Best cluster labels (by silhouette) for each method.