Methods and interpretation
ddmra evaluates residual distance-dependent motion-related artifact in
resting-state fMRI connectivity estimates. The package is designed for
benchmarking denoising or preprocessing strategies, not for estimating a
neuroscientific connectivity effect. A good denoising strategy should reduce
associations between a run-level quality-control metric and functional
connectivity while preserving enough temporal degrees of freedom and data for
the downstream scientific analysis.
The workflow follows the broad evaluation logic used in resting-state fMRI denoising benchmarks, where residual motion artifact is assessed with QC-FC associations, distance dependence of those associations, scrubbing-related connectivity changes, and accounting for data loss or temporal degrees of freedom. See, for example, Power et al. (2012), Power et al. (2018), Ciric et al. (2017), and Parkes et al. (2018).
The DDMRA analyses implemented here are based primarily on Power et al.
(2018), PNAS. ddmra is not a
line-for-line reproduction of the original analysis scripts. It keeps the core
scientific target of the original method, namely evaluating whether
motion-related effects on functional connectivity vary with the physical
distance between regions, but makes several implementation choices intended to
make the analyses reusable for modern denoising comparisons.
References:
Differences from Power et al. (2018)
ddmra differs from the original Power et al. DDMRA implementation in the
following ways.
- Generalized inputs and atlases
Power et al. evaluated a specific analysis setting.
ddmraaccepts arbitrary 4D NIfTI runs and either a local labels-image atlas or selected Nilearn coordinate atlases. The rationale is to support controlled comparisons among preprocessing or denoising pipelines, as long as all inputs are in the same space and resolution.- Scrubbing correlations are Fisher-z-transformed
In the scrubbing analysis,
ddmraFisher-z-transforms the full and scrubbed correlation coefficients before subtracting and averaging them. The original Power implementation did not apply this transform in the same way. The rationale is to keep all implemented connectivity summaries on a Fisher-z scale and to avoid averaging raw correlations directly.- Scrubbing uses the opposite sign convention
The original scrubbing contrast and the
ddmracontrast have opposite signs.ddmracomputesfull time series - scrubbed time series. The rationale is interpretive consistency: under this convention, larger positive local-distance effects point in the same artifact-like direction across QC-FC, high-low QC, and scrubbing summaries.- Inference is performed on smoothing-curve summaries
ddmratreats edgewise ranks as diagnostics only. Inferential p-values are computed for the smoothing-curve intercept and slope against permutation null smoothing curves, using a plus-one finite-permutation correction. The rationale is to make inference target the reported distance-dependence summaries rather than treating edgewise ranks as p-values.- QC-FC can adjust for run-level covariates
ddmracan residualize mean QC and edgewise connectivity with respect to run-level covariates before estimating QC-FC. The rationale is that denoising evaluations can be biased when motion is associated with site, age, diagnosis, acquisition, or other run-level variables.- Data loss and temporal degrees of freedom are explicit outputs
ddmrawritesrun_denoising_summary.tsvwith retained-run flags, volume counts, confound counts, nominal temporal degrees of freedom after confounds, and optional user-supplied denoising metrics. The rationale is that lower residual motion artifact is not automatically better if it is achieved by excessive volume loss or loss of temporal degrees of freedom.- Pipeline comparisons use paired label-swap tests
The pipeline-comparison workflow adds direct pairwise tests that were not part of the original single-pipeline DDMRA method. These tests randomly swap pipeline labels within run and compare the pipeline difference in intercept and slope. The rationale is that pipelines are applied to the same runs, so direct paired inference is more appropriate than comparing independent per-pipeline p-values.
Single-pipeline workflow
The ddmra.workflows.run_analyses() workflow takes a list of 4D NIfTI
files and a matching list of one-dimensional QC arrays, one array per run. The
QC array is usually framewise displacement, but may be another time-resolved
quality measure if it is defined for every volume in the run. All images should
be in the same space and resolution and should be compatible with the selected
atlas.
The workflow performs the following steps:
Validate that each QC array is one-dimensional, non-empty, and finite.
Extract ROI time series with either a labels-image atlas or a coordinate sphere atlas.
Compute ROI-to-ROI distances and sort edges from short to long distance.
Build z-transformed functional connectivity matrices for analyses that use run-level connectivity.
Drop runs with NaN ROI time series or zero-variance ROI time series.
Optionally identify multivariate connectivity outliers using PCA scores and a robust covariance estimator.
Compute the selected artifact analyses.
Smooth edgewise analysis values over distance and assess the smoothing curve intercept and slope against permutation null distributions.
The workflow writes run_denoising_summary.tsv for every run. This file is
important for interpretation because apparent denoising gains can be coupled to
data loss, temporal degrees of freedom, or complete removal of difficult runs.
Direct scientific comparisons of denoising strategies should consider these
columns alongside the artifact metrics.
QC-FC analysis
QC-FC measures the association between mean run QC and each functional
connectivity edge across runs. For each retained run, ddmra averages the
QC time series to one run-level value and computes Fisher-z-transformed ROI
correlations. For each edge, it then correlates run-level QC with the edge’s
connectivity values across runs and Fisher-z-transforms that QC-FC
correlation.
If run_covariates are supplied, QC-FC is computed after residualizing both
the run-level QC vector and edgewise connectivity values with respect to those
covariates. This is useful when age, site, group, acquisition, or other
run-level factors could otherwise confound the QC-FC estimate.
Interpretation:
Values near zero indicate little linear association between run quality and that connectivity edge.
A distance-dependent curve with stronger short-distance effects than long-distance effects is consistent with residual motion-related artifact.
QC-FC is not a measure of neural signal preservation. It should be interpreted with data-loss and reliability or validity benchmarks when available.
ddmra also writes two descriptive QC-FC benchmark summaries to
qcrsfc_summary.tsv: the median absolute QC-FC correlation and the percentage
of edges with a significant QC-FC correlation (two-sided, uncorrected, at
alpha = 0.05). These are the standard scalar QC-FC summaries reported in the
denoising literature (Ciric et al., 2017; Parkes et al., 2018). Lower values
indicate less residual association between run quality and connectivity, and
under a pipeline with no residual QC-FC the percentage of significant edges
should approach 100 * alpha. These summaries are descriptive diagnostics;
inference in ddmra is based on the smoothing-curve intercept and slope.
High-low QC analysis
The high-low analysis splits retained runs into high-QC and low-QC groups by
mean QC. For each edge, it subtracts the mean connectivity of the low-QC group
from the mean connectivity of the high-QC group. The split is controlled by the
highlow_cut fraction: 0.5 (default) is a median split that uses every
run, while smaller values (for example 0.25 for the top and bottom quartiles)
contrast the QC extremes and drop the middle runs. Extreme-group contrasts are
more sensitive to motion effects but use fewer runs.
Interpretation:
Values reflect the edgewise difference between higher-motion and lower-motion runs.
The analysis is intentionally simple and is useful as a complementary artifact benchmark.
Because it depends on a group split, it should not be treated as a substitute for covariate-adjusted QC-FC when continuous QC information and covariates are important.
Scrubbing analysis
The scrubbing analysis compares connectivity before and after removing volumes
whose QC values exceed qc_thresh. Within each run, the workflow computes
connectivity from the full time series and from the scrubbed time series, then
averages edgewise differences across retained runs. Runs are included in the
scrubbing analysis only when at least one volume is scrubbed and at least half
of the volumes remain.
ddmra uses the sign convention full time series - scrubbed time series.
This convention differs from the original Power et al. implementation, but it
keeps the direction of larger positive DDMRA effects similar across the
implemented analyses.
Because the scrubbing analysis Fisher-z-transforms raw connectivity, near-perfect
short-distance edge correlations are clipped to +/-0.999 before the transform to
keep the Fisher-z values finite. The number of clipped full and scrubbed edge
correlations is reported in the run log so that any compression of the most extreme
edges is visible.
Interpretation:
Larger effects indicate connectivity changes associated with removing high-QC volumes.
Scrubbing results are conditional on the selected QC threshold and the availability of runs with both retained and scrubbed volumes.
A method that reduces scrubbing-related effects by discarding many volumes should be evaluated together with temporal degrees of freedom and retained volume counts.
Distance smoothing, intercepts, and slopes
All three analyses produce edgewise values ordered by ROI-to-ROI distance.
ddmra smooths these values with a moving average over distance-sorted edges
and then averages values at identical distances. The smoothed curve is used for
summary inference.
Two scalar summaries are tested:
intercept_35mm: the smoothed curve value at 35 mm.slope_35_to_100mm: the value at 35 mm minus the value at 100 mm.
The intercept is sensitive to the overall magnitude of local residual artifact. The slope is sensitive to distance dependence, with larger positive values indicating stronger local than long-distance effects under the package’s sign conventions.
The workflow tests these summaries against permutation null curves with the
plus-one finite-permutation correction, so the smallest possible p-value is
1 / (n_iters + 1). The per-pipeline p-values answer whether a pipeline’s
artifact summary is larger than expected under that pipeline’s null model. They
do not directly answer whether one pipeline is better than another.
Output files from run_analyses
analysis_values.tsv.gzEdgewise unsmoothed values for the selected analyses.
smoothing_curves.tsv.gzDistance-smoothed values used for intercept and slope summaries.
null_smoothing_curves.npzPermutation null smoothing curves for each selected analysis.
ranks.tsv.gzDiagnostic edgewise ranks of observed values against edgewise null values. These ranks are not p-values and should not be interpreted as inferential evidence.
qcrsfc_summary.tsvDescriptive QC-FC benchmark summaries (median absolute QC-FC correlation and percentage of significant edges), written only when the
qcrsfcanalysis is requested.run_denoising_summary.tsvRun-level accounting for input volumes, QC thresholding, confound counts, nominal temporal degrees of freedom after confounds, retention after data loading, retention for analysis, and optional user-provided denoising or data-loss metrics.
log.tsvWorkflow messages, including retention counts and per-analysis intercept and slope p-values.
analysis_results.pngSummary figure showing the available analysis curves.
Pipeline-comparison workflow
The ddmra.workflows.run_pipeline_comparison() workflow directly supports
comparisons among processing pipelines. It accepts a TSV file or
pandas.DataFrame with one row per run and one column per pipeline. Each
cell must contain a path to a 4D NIfTI file for that run and pipeline.
Example TSV:
preprocessed XCP-D tedana
sub-01_preproc_bold.nii.gz sub-01_xcpd_bold.nii.gz sub-01_tedana_bold.nii.gz
sub-02_preproc_bold.nii.gz sub-02_xcpd_bold.nii.gz sub-02_tedana_bold.nii.gz
Relative paths in a TSV are resolved relative to the TSV file. All selected pipeline columns must have the same number of rows, and each row is assumed to represent the same run across pipelines. The current implementation supports NIfTI files only.
The workflow has two layers:
It runs
ddmra.workflows.run_analyses()separately for each selected pipeline and writes each pipeline’s outputs to a subdirectory.By default, it performs direct pairwise statistical comparisons between pipelines.
Direct paired comparisons
Direct comparisons are performed for every selected pair of pipelines. For a
given pair, ddmra uses the intersection of runs retained for analysis by
both pipelines. It then recomputes the selected analysis curves on this paired
run set and compares the pipelines’ smoothing-curve summaries.
For each analysis and pipeline pair, the observed difference is:
pipeline_1 summary - pipeline_2 summary
The null distribution is generated with paired run-wise pipeline-label swaps. For each permutation, the two pipeline labels are randomly swapped or not swapped within each run, and the pipeline difference is recomputed. This tests the null hypothesis that the two pipeline outputs are exchangeable within run. The procedure preserves:
the run-level QC time series,
run identity and pairing,
the selected atlas and distance structure,
the run set used for the pairwise comparison, and
run-level covariates used in QC-FC adjustment.
This paired label-swap test is preferable to comparing two independent per-pipeline p-values, because the pipelines are applied to the same runs and their estimates are not independent.
Direct comparison p-values are two-sided and use the same plus-one
finite-permutation correction as the single-pipeline workflow. Increasing
comparison_n_iters improves p-value resolution.
Interpreting pipeline-comparison results
pipeline_pairwise_comparisons.tsv contains one row per pipeline pair,
analysis, and scalar contrast. Important columns include:
pipeline_1andpipeline_2The ordered pair being compared.
analysisOne of
qcrsfc,highlow, orscrubbing.contrastEither
intercept_35mmorslope_35_to_100mm.pipeline_1_valueandpipeline_2_valueThe paired-run summary values for the two pipelines.
differencepipeline_1_value - pipeline_2_value.p_valueTwo-sided paired label-swap p-value for the difference.
n_paired_runsNumber of runs retained by both pipelines and used in the direct comparison.
For DDMRA artifact summaries, a lower positive intercept or slope is often consistent with less residual distance-dependent artifact. However, users should inspect the sign and shape of the full smoothing curves. If values cross zero or if one pipeline changes the curve shape in a nonuniform way, the scalar intercept and slope should be treated as summaries rather than complete descriptions of performance.
The direct comparison outputs are:
pipeline_comparison_summary.tsvOne row per pipeline with the pipeline output directory.
pipeline_pairwise_comparisons.tsvPairwise intercept and slope differences with p-values.
pipeline_pairwise_smoothing_curves.tsv.gzObserved paired smoothing curves and distance-wise differences.
pipeline_pairwise_nulls.npzNull distributions for each pair, analysis, and scalar contrast.
Practical guidance
Use the same atlas, space, resolution, QC metric, and run order for all pipelines in a comparison.
Prefer direct paired comparison p-values over informal comparisons of separate per-pipeline p-values.
Report
n_paired_runsand inspectrun_denoising_summary.tsvfor each pipeline.Treat data-loss and temporal degrees-of-freedom differences as part of the denoising result, not as incidental bookkeeping.
Use a sufficiently large sample. QC-FC and high-low estimates are unstable in small samples, so
ddmrawarns when fewer than 30 runs are retained for these analyses and refuses to run with fewer than 10 (Parkes et al., 2018; Ciric et al., 2017).Use enough permutations for the inferential claim. With 10000 permutations, the minimum p-value is approximately 0.0001.
Correct for multiple comparisons when making claims across many pipeline pairs, analyses, or contrasts.
Do not interpret lower QC-FC or DDMRA values alone as proof of better neural signal preservation. Pair these metrics with reliability, identifiability, known network structure, or task/behavioral validity checks when those questions matter.