B2 — Stream–applicant matching: how well does it work?

Context

Two related questions about how applicants and streams find each other.

(1) Are accepted applicants matching with their top choice? Each Stage-3 applicant submits a ranked preference list of streams. For the 108 applicants who accepted an offer, we can ask: was the offering stream their #1 pick, #2, #5? If most matches happen near the top of the preference list, the pref-rank system is doing its job. If they happen low or off-list, that suggests applicants are flexible or that matching happens through paths the preference list doesn't capture.

(2) Are applicants good at predicting which streams will pick them? At Stage 1, applicants optionally indicate which streams they're interested in. At Stage 3, they actually apply to streams. Some Stage-3 applications go to streams the applicant didn't flag at Stage 1 — 'added' streams. If those added streams reliably rank the applicant, that's evidence applicants and streams find each other even without explicit Stage-1 stream selection. That would support making Stage-1 stream questions optional / low-friction for 11.0.

MATS (Machine Alignment, Transparency & Security) is an AI safety research fellowship that places ~120 fellows with ~100 mentors per cohort. Cohort 10.0 ran in summer 2026 and was the first cohort with a centralized application review instead of decentralized stream-specific review. This analysis is part of a broader effort to evaluate the 10.0 process and inform the design of 11.0 (autumn 2026).

How the 10.0 selection pipeline worked (click to expand)

The 10.0 pipeline in brief. ~2,200 people applied. Each applicant went through three stages:

  1. Stage 1 — submitted background / experience / motivation, picked which research tracks they were interested in (Empirical, Policy & Strategy, Technical Governance, Theory, Compute Infrastructure). An LLM screen filtered out applicants who clearly didn't meet a minimum bar, and produced advisory per-stream recommendations.
  2. Stage 2 — applicants who passed Stage 1 had their materials scored by LLM-graded rubrics. The empirical track used a composite score combining Research Skills, Technical Execution (split into MLE, SWE, Math sub-scores), and Soft Skills. The top ~600 by composite advanced to Stage 3.
  3. Stage 3 — applicants chose specific mentors / "streams" to apply to. Each stream reviewed its applicants and produced a ranked list. Top-ranked applicants got offers; lower-ranked got waitlisted. ~120 offers were made.

For the empirical track, the composite formula is 0.50·RS + 0.35·TE + 0.15·SS, where TE = 0.50·MLE + 0.30·SWE + 0.20·Math. A "relevance multiplier" (Direct=1.0 / Adjacent=0.85 / Distant=0.60) is applied to Research Skills based on how the applicant's experience matches the streams they applied to.

Outcome definitions (click to expand)

Outcome definitions used throughout these analyses:

Important caveats

Q1 — Are committers matching with their top picks?

Of 108 committers (offered + accepted), 90 had their committed stream appear in their own preference list. The distribution:

Preference rank of committed stream Committers
1 40 (37%)
2 21 (19%)
3 5 (5%)
4+ 24 (22%)
(not in pref list) 18 (17%)

40/108 (37%) committed to their #1 preference. 66/108 (61%) committed to a top-3 preference. The remainder either matched with a lower-preference stream or with a stream not in their preference list at all (often because they didn't submit a preference list).

Q2 — How much self-selection happens between Stage 1 and Stage 3?

Of 1,059 Stage-3 applicants:

A meaningful fraction of applicants ended up applying — and being ranked — by streams they didn't pre-flag at Stage 1. This is partly an artifact of optional Stage-1 selection: of the Stage-3 applicants on Empirical/P&S/TG-only tracks, 52 of 538 (10%) didn't make any Stage-1 stream selections at all. For these applicants, every Stage-3 application is an "added" stream by definition.

Takeaways

  1. Most matches happen at or near the top of the preference list. ~61% of committers got an offer from a top-3 preference. The preference-list system reflects real matching, not random assignment.
  2. A substantial minority of applicants reach matches with streams they didn't initially flag at Stage 1. This is partly because Stage-1 stream selection was optional for most tracks. But even excluding that artifact, the pattern suggests applicants and streams can find each other late in the process.
  3. For 11.0: Making the Stage-1 stream questions even more optional (or dropping them for most applicants) is supported by this — applicants aren't being meaningfully advantaged by signaling early, and the Stage-3 application stage is where matching actually happens.
🔧 Debug — how the data was interpreted (click to expand; safe to skip)

Sample. Canonical 10.0 sample. Q1 uses the 108 applicants with a non-null [offers] Accepted stream (offered + accepted). Q2 uses the 1,059 Stage-3 applicants.

Outcome variable(s). Q1: position of committed stream in applicant's [stage-3-stream] Stream ranking list (1 = top preference). Q2: did any 'added at Stage 3' stream end up ranking the applicant?

Predictor fields. Descriptive.

Filters applied. Canonical dedup. No further exclusions.

Missing-data handling. Q1: applicants without a preference list (n=18) shown in '(not in pref list)' bucket. Q2: empty Stage-1 stream sets treated as such (producing maximal 'added at S3'), which is a UI artifact noted in the caveats.

Key assumptions / caveats.