B3 — Do AI-safety engagement signals predict ranking?

Context

MATS cares about whether applicants are working on AI safety for the right reasons. The 10.0 application captured this in three ways: a Theory-of-Change (ToC) ranking alignment score (0–100) measuring whether the applicant's ranking of AI risks matched a reference ordering; AIS engagement multi-select fields listing courses/programs/orgs; and duration (free text) of how long they've engaged with AI safety. Do any of these actually predict who gets ranked by a stream?

MATS (Machine Alignment, Transparency & Security) is an AI safety research fellowship that places ~120 fellows with ~100 mentors per cohort. Cohort 10.0 ran in summer 2026 and was the first cohort with a centralized application review instead of decentralized stream-specific review. This analysis is part of a broader effort to evaluate the 10.0 process and inform the design of 11.0 (autumn 2026).

How the 10.0 selection pipeline worked (click to expand)

The 10.0 pipeline in brief. ~2,200 people applied. Each applicant went through three stages:

  1. Stage 1 — submitted background / experience / motivation, picked which research tracks they were interested in (Empirical, Policy & Strategy, Technical Governance, Theory, Compute Infrastructure). An LLM screen filtered out applicants who clearly didn't meet a minimum bar, and produced advisory per-stream recommendations.
  2. Stage 2 — applicants who passed Stage 1 had their materials scored by LLM-graded rubrics. The empirical track used a composite score combining Research Skills, Technical Execution (split into MLE, SWE, Math sub-scores), and Soft Skills. The top ~600 by composite advanced to Stage 3.
  3. Stage 3 — applicants chose specific mentors / "streams" to apply to. Each stream reviewed its applicants and produced a ranked list. Top-ranked applicants got offers; lower-ranked got waitlisted. ~120 offers were made.

For the empirical track, the composite formula is 0.50·RS + 0.35·TE + 0.15·SS, where TE = 0.50·MLE + 0.30·SWE + 0.20·Math. A "relevance multiplier" (Direct=1.0 / Adjacent=0.85 / Distant=0.60) is applied to Research Skills based on how the applicant's experience matches the streams they applied to.

Outcome definitions (click to expand)

Outcome definitions used throughout these analyses:

Why this matters

The 11.0 planning includes 'mission-alignment correlation' as an explicit workstream. Chris Ackerman's memo on graduating-fellow AI safety filtering argues that we should be filtering more aggressively on this; this analysis is partial evidence for or against that.

A prior analysis (May 2026, 11.0 application design): Pearson r(ToC score, is_ranked) = +0.139, p=5e-11. Quintile P(ranked) spread: Q1 = 3.8%, Q5 = 15.1% (~4× spread). We reproduce here.

⚠️ Caveat — AIS form bug in 10.0

The 10.0 application form for AI safety engagement had a UI bug: the secondary detail panels were swapped. When an applicant selected research program in the main multi-select, the form opened the structured course detail panel (and vice versa). This means the secondary detail fields — which specific programs / courses an applicant listed — are unreliable for any applicant who didn't select BOTH research program AND structured course in the main multi-select.

This analysis only uses the main multi-select count and the duration field, both of which are unaffected by the bug. But any downstream analysis that tries to use the specific-program or specific-course detail fields will need to filter to applicants who selected both flags in the main multi-select.

Headline

ToC alignment score has a modest but reliable association with is_ranked in the full pool: AUC = 0.646 [0.607, 0.688] (n = 2,199). The quintile spread is ~4× from bottom to top — broadly consistent with the May 2026 prior analysis.

In the Stage-3 empirical pool (range-restricted), the ToC signal attenuates substantially (AUC 0.612 [0.562, 0.661]). Most of the full-pool effect comes from the Stage-1 → Stage-2 → Stage-3 gating: low-ToC applicants disproportionately don't reach Stage 3 in the first place.

AIS engagement count and duration carry less signal — both AUCs sit near 0.5 in the full pool and barely above chance in Stage 3.

AUC summary

Sample Predictor Outcome n AUC 95% CI
Full 10.0 pool ToC alignment score is_ranked 2,199 0.646 [0.607, 0.688]
Full 10.0 pool ToC alignment score is_invited 2,199 0.586 [0.558, 0.613]
Full 10.0 pool AIS engagement count is_ranked 2,203 0.525 [0.501, 0.547]
Full 10.0 pool AIS engagement count is_invited 2,203 0.526 [0.510, 0.541]
Full 10.0 pool AIS prior duration (years) is_ranked 1,235 0.597 [0.549, 0.646]
Full 10.0 pool AIS prior duration (years) is_invited 1,235 0.560 [0.525, 0.595]
Stage-3 empirical ToC alignment score is_ranked 791 0.612 [0.562, 0.661]
Stage-3 empirical ToC alignment score is_invited 791 0.553 [0.514, 0.592]
Stage-3 empirical AIS engagement count is_ranked 791 0.518 [0.494, 0.542]
Stage-3 empirical AIS engagement count is_invited 791 0.512 [0.492, 0.534]
Stage-3 empirical AIS prior duration (years) is_ranked 470 0.573 [0.510, 0.632]
Stage-3 empirical AIS prior duration (years) is_invited 470 0.510 [0.461, 0.566]

ToC quintile breakdown

Quintile n Mean ToC P(ranked) P(invited)
Q1 449 40.5 3.8% 15.8%
Q2 438 57.9 5.3% 18.5%
Q3 440 69.7 8.4% 26.1%
Q4 449 81.2 10.5% 26.9%
Q5 423 94.8 15.1% 30.7%

Takeaways

  1. ToC alignment carries real signal, but most of it is captured before Stage 3 — by the time we're looking at stream rankings within Stage 3, the ToC score isn't doing much more work.
  2. Raw AIS engagement count is a weak predictor. Counting how many AI-safety programs someone has been involved with adds little. The intuition: it's a noisy proxy for actual commitment. People with one substantive AIS program look the same as people with three superficial ones in this measure.
  3. Duration is also weak. Free-text "X years" doesn't disambiguate depth of engagement from elapsed calendar time.
  4. For 11.0: the ToC ranking question is worth keeping — it's a cheap and effective screen. The flat AIS-engagement multi-select is probably not pulling its weight; consider replacing with something that captures depth (e.g., a structured "what did you build / publish / contribute?" prompt).
🔧 Debug — how the data was interpreted (click to expand; safe to skip)

Sample. Two samples: full 10.0 pool (n=2,203 deduped) and Stage-3 empirical (n=791). Each AUC is computed on the per-predictor non-null subset.

Outcome variable(s). is_ranked (primary) and is_invited_to_worktest (secondary).

Predictor fields. [stage-1-toc] Ranking alignment score (numeric 0–100), [stage-1-ais] Prior AI safety/security engagement multi-select mapped to count of selected categories, and [stage-1-ais] Prior AI safety/security duration parsed via regex (handles 'X year(s)', 'X month(s)', 'X-Y years'; long free-text answers >80 chars dropped).

Filters applied. Canonical dedup. Stage-3 empirical filter same as A1.

Missing-data handling. Per-predictor listwise drop.

Key assumptions / caveats.