SDT measures assume internal evidence follows normal distributions. They disentangle metacognitive sensitivity from bias.
The standard measure of perceptual discrimination ability. Quantifies how well the observer separates the two stimulus distributions.
where H = hit rate, F = false-alarm rate, \(\Phi^{-1}\) = probit function.
The value of d’ that a hypothetical ideal observer would need to produce the observed pattern of Type 2 (confidence) responses. Fit via maximum likelihood to the Type 2 ROC curve.
where \(\ell\) is the log-likelihood under the SDT model with metacognitive sensitivity \(m\), type 1 sensitivity \(d'\), and criteria \(c\). Implemented via L-BFGS-B MLE.
Normalises metacognitive sensitivity by task performance. Values near 1 indicate ideal metacognition; values < 1 suggest information loss in the metacognitive system.
Range: any positive real number. Unstable when d’ ≈ 0.
Difference-based alternative to M-Ratio. Negative values indicate a metacognitive deficit; positive values indicate a surplus. Tends to over-correct at extreme d’ values.
Measure how strongly confidence tracks accuracy on a trial-by-trial basis. No SDT model assumptions required; computable from raw trial data.
The simplest association measure: the difference in mean confidence between correct and incorrect trials. Positive values indicate higher confidence on correct trials.
Units: raw confidence scale. Strongly depends on task d’.
The area under the curve that plots Type 2 hit rate against Type 2 false-alarm rate across all confidence thresholds. The oldest metacognitive measure (proposed 1950s). Computed here from empirical response frequencies.
Range: 0.5 (chance) – 1.0 (perfect). Strongly depends on task d’.
The rank correlation between trial-by-trial confidence and accuracy. The most common metacognitive measure in the memory literature.
where C = concordant pairs (high conf & correct > low conf & incorrect), D = discordant pairs. Range: \(-1\) to \(+1\).
The Pearson product-moment correlation between trial-by-trial confidence rating and binary accuracy. Assumes a linear relationship between the two.
where \(c_i\) = confidence on trial \(i\), \(a_i \in \{0,1\}\). Range: \(-1\) to \(+1\).
Item-level measures comparing normalised confidence to binary accuracy. Confidence is rescaled to [0, 1] before computation. Originally described for free-recall and knowledge-monitoring tasks.
The signed mean discrepancy between normalised confidence and accuracy. Positive = overconfident; negative = underconfident.
\(\hat{c}_i\) = confidence normalised to [0,1]; \(p_i \in \{0,1\}\). Range: \(-1\) to \(+1\).
The mean squared error between normalised confidence and accuracy. Lower values are better. Sometimes called the Brier score.
Range: 0 (perfect) – 1 (worst). Sensitive to both direction and magnitude of error.
Measures how well the observer assigns higher confidence to correct versus incorrect items, weighted by the proportion of each response type. Related to ΔConf but with count-weighting.
\(N_c, N_e\) = number of correct / error trials; \(N = N_c + N_e\). Range: \(-c_{\max}\) to \(+c_{\max}\).
Assesses whether confidence judgments are more variable for correct trials than error trials. Positive scatter means confidence is more spread out on correct trials.
Range: \(-\infty\) to \(+\infty\). Near zero = equal variability.
Recommended reading:
Maniscalco, B., & Lau, H. (2014). Signal detection theory analysis of type 1 and type 2 data: meta-d’, response-specific meta-d’, and the unequal variance SDT model. In S. M. Fleming & C. D. Frith (Eds.), The Cognitive Neuroscience of Metacognition (pp. 25–66). Springer.
Rahnev, D. (2025). A comprehensive assessment of current methods for measuring metacognition. Nature Communications, 16, 701. https://doi.org/10.1038/s41467-025-56117-0
Schraw, G. (2009). A conceptual analysis of five measures of metacognitive monitoring. Metacognition and Learning, 4, 33–45. https://doi.org/10.1007/s11409-008-9031-3