ddmra.workflows.run_analyses

run_analyses(files, qc, out_dir='.', confounds=None, n_iters=10000, n_jobs=1, qc_thresh=0.2, window=1000, analyses=('qcrsfc', 'highlow', 'scrubbing'), verbose=False, pca_threshold=None, outlier_threshold=None, atlas='power_2011', sphere_radius=5.0, run_covariates=None, run_denoising_metrics=None, highlow_cut=0.5)[source]

Run scrubbing, high-low motion, and QCRSFC analyses.

Parameters:
  • files ((N,) list of nifti files) – List of 4D (X x Y x Z x T) images in MNI space.

  • qc ((N,) list of array_like) – List of 1D (T) numpy arrays with QC metric values per img (e.g., FD or respiration).

  • out_dir (str, optional) – Output directory. Default is current directory.

  • confounds (None or (N,) list of array-like, optional) – List of 2D (T) numpy arrays with confounds per img. Default is None (no confounds are removed).

  • n_iters (int, optional) – Number of iterations to run to generate null distributions. Default is 10000.

  • n_jobs (int, optional) – The number of CPUs to use to do the computation. -1 means ‘all CPUs’. Default is 1.

  • qc_thresh (float, optional) – Threshold for QC metric used in scrubbing analysis. Default is 0.2 (for FD).

  • window (int, optional) – Number of units (pairs of ROIs) to include when averaging to generate smoothing curve. Default is 1000.

  • analyses (tuple, optional) – The analyses to run. Must be one or more of “qcrsfc”, “highlow”, “scrubbing”.

  • verbose (bool, optional) – If verbose, write out the correlation coefficients used by the QC:RSFC and high-low analyses. Default is False.

  • pca_threshold (None or float or int, optional) – If None, do not perform outlier detection at all. If a float, perform PCA and retain components explain that proportion of the variance. If an int, perform PCA and retain that number of components.

  • outlier_threshold (None or float, optional) – If None, do not perform outlier detection at all. If a float, flag any runs with Mahalanobis distance p-value < the float according to chi-squared distribution.

  • atlas (str or path-like, optional) – Atlas to use for ROI time series extraction. If a path to an existing file, the file is treated as a labels image and loaded with nilearn.maskers.NiftiLabelsMasker. Otherwise, the value is treated as the name of a coordinate atlas available through nilearn.datasets, such as "power_2011", "dosenbach_2010", or "seitzman_2018", and loaded with nilearn.maskers.NiftiSpheresMasker. Default is "power_2011".

  • sphere_radius (float, optional) – Radius in millimeters for sphere atlases. Ignored when atlas is a labels image. Default is 5.0.

  • run_covariates (None or pandas.DataFrame, optional) – Run-level covariates to adjust for in the QC:RSFC analysis. Rows must correspond to files in order. Numeric columns are used directly, and categorical columns are dummy-coded with one reference level. Default is None.

  • run_denoising_metrics (None or pandas.DataFrame, optional) – Run-level denoising and data-loss metrics to include in run_denoising_summary.tsv. Rows must correspond to files in order, and columns must be numeric. Useful columns include upstream retained-volume counts, censored-volume counts, or temporal degrees of freedom after denoising. Default is None.

  • highlow_cut (float, optional) – Fraction of runs assigned to each extreme QC group in the high-low analysis, in (0, 0.5]. 0.5 (default) is a median split; smaller values (e.g., 0.25 for top vs bottom quartiles) contrast the QC extremes and drop the middle runs. See ddmra.analysis.highlow_analysis().

Notes

At least MIN_SUBJECTS (10) runs must be retained for analysis, or a ValueError is raised. When the QC:RSFC or high-low analyses are requested and fewer than QCFC_STABILITY_N (30) runs are retained, a UserWarning is issued because QC-FC estimates are unstable in small samples (Parkes et al., 2018; Ciric et al., 2017); the analyses still run, but their intercept and slope summaries should be interpreted with caution.

This function writes out several files to out_dir: - analysis_values.tsv.gz: Raw analysis values for analyses.

Has four columns: distance, qcrsfc, scrubbing, and highlow.

  • smoothing_curves.tsv.gz: Smoothing curve information for analyses.

    Has four columns: distance, qcrsfc, scrubbing, and highlow.

  • null_smoothing_curves.npz:

    Null smoothing curves from each analysis. Contains three 2D arrays, where number of columns is same size and order as distance column in smoothing_curves.tsv.gz and number of rows is number of iterations for permutation analysis. The three arrays’ keys are ‘qcrsfc’, ‘highlow’, and ‘scrubbing’.

  • ranks.tsv.gz: Diagnostic edgewise ranks of the observed analysis values against

    the edgewise null distributions. These ranks are not inferential p-values.

  • qcrsfc_summary.tsv (only when the qcrsfc analysis is requested):

    Descriptive QC-FC benchmark summaries, including the median absolute QC-FC correlation and the percentage of edges with a significant QC-FC correlation (Ciric et al., 2017; Parkes et al., 2018). These are diagnostics, not the package’s inferential result.

  • run_denoising_summary.tsv: Run-level volume, confound-regressor, retention,

    and optional user-provided tDOF/data-loss accounting.

  • [analysis]_analysis.png: Figure for each analysis.

If verbose is True: - z_corrs.tsv.gz: Z-transformed correlation coefficients for the good files,

used by QC:RSFC and high-low analyses.

  • mean_qcs.tsv.gz: Mean QC values for the good files, used by QC:RSFC and high-low

    analyses.