Methods and interpretation

ddmra evaluates residual distance-dependent motion-related artifact in resting-state fMRI connectivity estimates. The package is designed for benchmarking denoising or preprocessing strategies, not for estimating a neuroscientific connectivity effect. A good denoising strategy should reduce associations between a run-level quality-control metric and functional connectivity while preserving enough temporal degrees of freedom and data for the downstream scientific analysis.

The workflow follows the broad evaluation logic used in resting-state fMRI denoising benchmarks, where residual motion artifact is assessed with QC-FC associations, distance dependence of those associations, scrubbing-related connectivity changes, and accounting for data loss or temporal degrees of freedom. See, for example, Power et al. (2012), Power et al. (2018), Ciric et al. (2017), and Parkes et al. (2018).

The DDMRA analyses implemented here are based primarily on Power et al. (2018), PNAS. ddmra is not a line-for-line reproduction of the original analysis scripts. It keeps the core scientific target of the original method, namely evaluating whether motion-related effects on functional connectivity vary with the physical distance between regions, but makes several implementation choices intended to make the analyses reusable for modern denoising comparisons.

References:

Differences from Power et al. (2018)

ddmra differs from the original Power et al. DDMRA implementation in the following ways.

Generalized inputs and atlases: Power et al. evaluated a specific analysis setting. ddmra accepts arbitrary 4D NIfTI runs and either a local labels-image atlas or selected Nilearn coordinate atlases. The rationale is to support controlled comparisons among preprocessing or denoising pipelines, as long as all inputs are in the same space and resolution.
Scrubbing correlations are Fisher-z-transformed: In the scrubbing analysis, ddmra Fisher-z-transforms the full and scrubbed correlation coefficients before subtracting and averaging them. The original Power implementation did not apply this transform in the same way. The rationale is to keep all implemented connectivity summaries on a Fisher-z scale and to avoid averaging raw correlations directly.
Scrubbing uses the opposite sign convention: The original scrubbing contrast and the ddmra contrast have opposite signs. ddmra computes full time series - scrubbed time series. The rationale is interpretive consistency: under this convention, larger positive local-distance effects point in the same artifact-like direction across QC-FC, high-low QC, and scrubbing summaries.
Inference is performed on smoothing-curve summaries: ddmra treats edgewise ranks as diagnostics only. Inferential p-values are computed for the smoothing-curve intercept and slope against permutation null smoothing curves, using a plus-one finite-permutation correction. The rationale is to make inference target the reported distance-dependence summaries rather than treating edgewise ranks as p-values.
QC-FC can adjust for run-level covariates: ddmra can residualize mean QC and edgewise connectivity with respect to run-level covariates before estimating QC-FC. The rationale is that denoising evaluations can be biased when motion is associated with site, age, diagnosis, acquisition, or other run-level variables.
Data loss and temporal degrees of freedom are explicit outputs: ddmra writes run_denoising_summary.tsv with retained-run flags, volume counts, confound counts, nominal temporal degrees of freedom after confounds, and optional user-supplied denoising metrics. The rationale is that lower residual motion artifact is not automatically better if it is achieved by excessive volume loss or loss of temporal degrees of freedom.
Pipeline comparisons use paired label-swap tests: The pipeline-comparison workflow adds direct pairwise tests that were not part of the original single-pipeline DDMRA method. These tests randomly swap pipeline labels within run and compare the pipeline difference in intercept and slope. The rationale is that pipelines are applied to the same runs, so direct paired inference is more appropriate than comparing independent per-pipeline p-values.

Single-pipeline workflow

The ddmra.workflows.run_analyses() workflow takes a list of 4D NIfTI files and a matching list of one-dimensional QC arrays, one array per run. The QC array is usually framewise displacement, but may be another time-resolved quality measure if it is defined for every volume in the run. All images should be in the same space and resolution and should be compatible with the selected atlas.

The workflow performs the following steps:

Validate that each QC array is one-dimensional, non-empty, and finite.
Extract ROI time series with either a labels-image atlas or a coordinate sphere atlas.
Compute ROI-to-ROI distances and sort edges from short to long distance.
Build z-transformed functional connectivity matrices for analyses that use run-level connectivity.
Drop runs with NaN ROI time series or zero-variance ROI time series.
Optionally identify multivariate connectivity outliers using PCA scores and a robust covariance estimator.
Compute the selected artifact analyses.
Smooth edgewise analysis values over distance and assess the smoothing curve intercept and slope against permutation null distributions.

The workflow writes run_denoising_summary.tsv for every run. This file is important for interpretation because apparent denoising gains can be coupled to data loss, temporal degrees of freedom, or complete removal of difficult runs. Direct scientific comparisons of denoising strategies should consider these columns alongside the artifact metrics.

QC-FC analysis

QC-FC measures the association between mean run QC and each functional connectivity edge across runs. For each retained run, ddmra averages the QC time series to one run-level value and computes Fisher-z-transformed ROI correlations. For each edge, it then correlates run-level QC with the edge’s connectivity values across runs and Fisher-z-transforms that QC-FC correlation.

If run_covariates are supplied, QC-FC is computed after residualizing both the run-level QC vector and edgewise connectivity values with respect to those covariates. This is useful when age, site, group, acquisition, or other run-level factors could otherwise confound the QC-FC estimate.

Interpretation:

Values near zero indicate little linear association between run quality and that connectivity edge.
A distance-dependent curve with stronger short-distance effects than long-distance effects is consistent with residual motion-related artifact.
QC-FC is not a measure of neural signal preservation. It should be interpreted with data-loss and reliability or validity benchmarks when available.

ddmra also writes two descriptive QC-FC benchmark summaries to qcrsfc_summary.tsv: the median absolute QC-FC correlation and the percentage of edges with a significant QC-FC correlation (two-sided, uncorrected, at alpha = 0.05). These are the standard scalar QC-FC summaries reported in the denoising literature (Ciric et al., 2017; Parkes et al., 2018). Lower values indicate less residual association between run quality and connectivity, and under a pipeline with no residual QC-FC the percentage of significant edges should approach 100 * alpha. These summaries are descriptive diagnostics; inference in ddmra is based on the smoothing-curve intercept and slope.

High-low QC analysis

The high-low analysis splits retained runs into high-QC and low-QC groups by mean QC. For each edge, it subtracts the mean connectivity of the low-QC group from the mean connectivity of the high-QC group. The split is controlled by the highlow_cut fraction: 0.5 (default) is a median split that uses every run, while smaller values (for example 0.25 for the top and bottom quartiles) contrast the QC extremes and drop the middle runs. Extreme-group contrasts are more sensitive to motion effects but use fewer runs.

Interpretation:

Values reflect the edgewise difference between higher-motion and lower-motion runs.
The analysis is intentionally simple and is useful as a complementary artifact benchmark.
Because it depends on a group split, it should not be treated as a substitute for covariate-adjusted QC-FC when continuous QC information and covariates are important.

Scrubbing analysis

The scrubbing analysis compares connectivity before and after removing volumes whose QC values exceed qc_thresh. Within each run, the workflow computes connectivity from the full time series and from the scrubbed time series, then averages edgewise differences across retained runs. Runs are included in the scrubbing analysis only when at least one volume is scrubbed and at least half of the volumes remain.

ddmra uses the sign convention full time series - scrubbed time series. This convention differs from the original Power et al. implementation, but it keeps the direction of larger positive DDMRA effects similar across the implemented analyses.

Because the scrubbing analysis Fisher-z-transforms raw connectivity, near-perfect short-distance edge correlations are clipped to +/-0.999 before the transform to keep the Fisher-z values finite. The number of clipped full and scrubbed edge correlations is reported in the run log so that any compression of the most extreme edges is visible.

Interpretation:

Larger effects indicate connectivity changes associated with removing high-QC volumes.
Scrubbing results are conditional on the selected QC threshold and the availability of runs with both retained and scrubbed volumes.
A method that reduces scrubbing-related effects by discarding many volumes should be evaluated together with temporal degrees of freedom and retained volume counts.

Distance smoothing, intercepts, and slopes

All three analyses produce edgewise values ordered by ROI-to-ROI distance. ddmra smooths these values with a moving average over distance-sorted edges and then averages values at identical distances. The smoothed curve is used for summary inference.

Two scalar summaries are tested:

intercept_35mm: the smoothed curve value at 35 mm.
slope_35_to_100mm: the value at 35 mm minus the value at 100 mm.

The intercept is sensitive to the overall magnitude of local residual artifact. The slope is sensitive to distance dependence, with larger positive values indicating stronger local than long-distance effects under the package’s sign conventions.

The workflow tests these summaries against permutation null curves with the plus-one finite-permutation correction, so the smallest possible p-value is 1 / (n_iters + 1). The per-pipeline p-values answer whether a pipeline’s artifact summary is larger than expected under that pipeline’s null model. They do not directly answer whether one pipeline is better than another.

Output files from `run_analyses`

analysis_values.tsv.gz: Edgewise unsmoothed values for the selected analyses.
smoothing_curves.tsv.gz: Distance-smoothed values used for intercept and slope summaries.
null_smoothing_curves.npz: Permutation null smoothing curves for each selected analysis.
ranks.tsv.gz: Diagnostic edgewise ranks of observed values against edgewise null values. These ranks are not p-values and should not be interpreted as inferential evidence.
qcrsfc_summary.tsv: Descriptive QC-FC benchmark summaries (median absolute QC-FC correlation and percentage of significant edges), written only when the qcrsfc analysis is requested.
run_denoising_summary.tsv: Run-level accounting for input volumes, QC thresholding, confound counts, nominal temporal degrees of freedom after confounds, retention after data loading, retention for analysis, and optional user-provided denoising or data-loss metrics.
log.tsv: Workflow messages, including retention counts and per-analysis intercept and slope p-values.
analysis_results.png: Summary figure showing the available analysis curves.

Pipeline-comparison workflow

The ddmra.workflows.run_pipeline_comparison() workflow directly supports comparisons among processing pipelines. It accepts a TSV file or pandas.DataFrame with one row per run and one column per pipeline. Each cell must contain a path to a 4D NIfTI file for that run and pipeline.

Example TSV:

preprocessed XCP-D   tedana
sub-01_preproc_bold.nii.gz   sub-01_xcpd_bold.nii.gz sub-01_tedana_bold.nii.gz
sub-02_preproc_bold.nii.gz   sub-02_xcpd_bold.nii.gz sub-02_tedana_bold.nii.gz

Relative paths in a TSV are resolved relative to the TSV file. All selected pipeline columns must have the same number of rows, and each row is assumed to represent the same run across pipelines. The current implementation supports NIfTI files only.

The workflow has two layers:

It runs ddmra.workflows.run_analyses() separately for each selected pipeline and writes each pipeline’s outputs to a subdirectory.
By default, it performs direct pairwise statistical comparisons between pipelines.

Direct paired comparisons

Direct comparisons are performed for every selected pair of pipelines. For a given pair, ddmra uses the intersection of runs retained for analysis by both pipelines. It then recomputes the selected analysis curves on this paired run set and compares the pipelines’ smoothing-curve summaries.

For each analysis and pipeline pair, the observed difference is:

pipeline_1 summary - pipeline_2 summary

The null distribution is generated with paired run-wise pipeline-label swaps. For each permutation, the two pipeline labels are randomly swapped or not swapped within each run, and the pipeline difference is recomputed. This tests the null hypothesis that the two pipeline outputs are exchangeable within run. The procedure preserves:

the run-level QC time series,
run identity and pairing,
the selected atlas and distance structure,
the run set used for the pairwise comparison, and
run-level covariates used in QC-FC adjustment.

This paired label-swap test is preferable to comparing two independent per-pipeline p-values, because the pipelines are applied to the same runs and their estimates are not independent.

Direct comparison p-values are two-sided and use the same plus-one finite-permutation correction as the single-pipeline workflow. Increasing comparison_n_iters improves p-value resolution.

Interpreting pipeline-comparison results

pipeline_pairwise_comparisons.tsv contains one row per pipeline pair, analysis, and scalar contrast. Important columns include:

pipeline_1 and pipeline_2: The ordered pair being compared.
analysis: One of qcrsfc, highlow, or scrubbing.
contrast: Either intercept_35mm or slope_35_to_100mm.
pipeline_1_value and pipeline_2_value: The paired-run summary values for the two pipelines.
difference: pipeline_1_value - pipeline_2_value.
p_value: Two-sided paired label-swap p-value for the difference.
n_paired_runs: Number of runs retained by both pipelines and used in the direct comparison.

For DDMRA artifact summaries, a lower positive intercept or slope is often consistent with less residual distance-dependent artifact. However, users should inspect the sign and shape of the full smoothing curves. If values cross zero or if one pipeline changes the curve shape in a nonuniform way, the scalar intercept and slope should be treated as summaries rather than complete descriptions of performance.

The direct comparison outputs are:

pipeline_comparison_summary.tsv: One row per pipeline with the pipeline output directory.
pipeline_pairwise_comparisons.tsv: Pairwise intercept and slope differences with p-values.
pipeline_pairwise_smoothing_curves.tsv.gz: Observed paired smoothing curves and distance-wise differences.
pipeline_pairwise_nulls.npz: Null distributions for each pair, analysis, and scalar contrast.

Practical guidance

Use the same atlas, space, resolution, QC metric, and run order for all pipelines in a comparison.
Prefer direct paired comparison p-values over informal comparisons of separate per-pipeline p-values.
Report n_paired_runs and inspect run_denoising_summary.tsv for each pipeline.
Treat data-loss and temporal degrees-of-freedom differences as part of the denoising result, not as incidental bookkeeping.
Use a sufficiently large sample. QC-FC and high-low estimates are unstable in small samples, so ddmra warns when fewer than 30 runs are retained for these analyses and refuses to run with fewer than 10 (Parkes et al., 2018; Ciric et al., 2017).
Use enough permutations for the inferential claim. With 10000 permutations, the minimum p-value is approximately 0.0001.
Correct for multiple comparisons when making claims across many pipeline pairs, analyses, or contrasts.
Do not interpret lower QC-FC or DDMRA values alone as proof of better neural signal preservation. Pair these metrics with reliability, identifiability, known network structure, or task/behavioral validity checks when those questions matter.

Methods and interpretation

Differences from Power et al. (2018)

Single-pipeline workflow

QC-FC analysis

High-low QC analysis

Scrubbing analysis

Distance smoothing, intercepts, and slopes

Output files from run_analyses

Pipeline-comparison workflow

Direct paired comparisons

Interpreting pipeline-comparison results

Practical guidance

Output files from `run_analyses`