B1 — How many applicants made it to each stage?

Context

Before asking any predictive question, it's useful to see the raw shape of the funnel. This analysis is a descriptive snapshot of how many applicants made it to each stage, both overall and broken down by which track they selected at Stage 1. Conversion rates here also serve as baselines for the rest of Part B (e.g., when we ask later 'do returning applicants have higher ranking rates than first-timers?', the all-applicant rate in this analysis is the comparison).

Track selections at Stage 1 are multi-select, so per-track funnels overlap — a person who selected Empirical AND Theory appears in both.

MATS (Machine Alignment, Transparency & Security) is an AI safety research fellowship that places ~120 fellows with ~100 mentors per cohort. Cohort 10.0 ran in summer 2026 and was the first cohort with a centralized application review instead of decentralized stream-specific review. This analysis is part of a broader effort to evaluate the 10.0 process and inform the design of 11.0 (autumn 2026).

How the 10.0 selection pipeline worked (click to expand)

The 10.0 pipeline in brief. ~2,200 people applied. Each applicant went through three stages:

  1. Stage 1 — submitted background / experience / motivation, picked which research tracks they were interested in (Empirical, Policy & Strategy, Technical Governance, Theory, Compute Infrastructure). An LLM screen filtered out applicants who clearly didn't meet a minimum bar, and produced advisory per-stream recommendations.
  2. Stage 2 — applicants who passed Stage 1 had their materials scored by LLM-graded rubrics. The empirical track used a composite score combining Research Skills, Technical Execution (split into MLE, SWE, Math sub-scores), and Soft Skills. The top ~600 by composite advanced to Stage 3.
  3. Stage 3 — applicants chose specific mentors / "streams" to apply to. Each stream reviewed its applicants and produced a ranked list. Top-ranked applicants got offers; lower-ranked got waitlisted. ~120 offers were made.

For the empirical track, the composite formula is 0.50·RS + 0.35·TE + 0.15·SS, where TE = 0.50·MLE + 0.30·SWE + 0.20·Math. A "relevance multiplier" (Direct=1.0 / Adjacent=0.85 / Distant=0.60) is applied to Research Skills based on how the applicant's experience matches the streams they applied to.

Outcome definitions (click to expand)

Outcome definitions used throughout these analyses:

Headline

Out of 2,203 canonical applications, 1,991 passed Stage 1 (90.4%), 804 reached Stage 3 (36.5%), 189 were ranked by ≥1 stream (8.6%), and 126 received offers (5.7%).

The two biggest filtering stages are Stage 2 → Stage 3 (composite-based, drops 1,187 applicants — ~60% of those who passed Stage 1) and Stage 3 → Ranked (stream-side review, drops 615 applicants — ~76% of Stage-3 entrants).

Overall funnel

Stage n % of applied % of previous stage
Applied 2,203 100.0% 100.0%
Passed Stage 1 1,991 90.4% 90.4%
Reached Stage 3 804 36.5% 40.4%
Engaged by ≥1 stream 519 23.6% 64.6%
Ranked by ≥1 stream 189 8.6% 36.4%
Offered 126 5.7% 66.7%
Waitlisted 63 2.9% 50.0%

Per-track funnels

Track Applied Passed S1 Reached S3 Ranked Offered
Empirical 1,683 1,592 638 147 107
Policy & Strategy 445 402 201 30 17
Technical Governance 373 341 157 25 20
Theory 687 600 229 75 36
Compute Infrastructure 277 214 69 14 6

Takeaways

  1. Stage 2 is the most aggressive filter — ~60% of Stage-1-passers are filtered out by composite. This is intentional: 10.0 over-recruited at Stage 1 and used the composite to compress the pool to a stream-reviewable size.
  2. The Empirical track is the dominant funnel in raw count (~3× the next biggest, Policy & Strategy). Empirical conversion rates are competitive but not best-in-class — Theory and Compute Infrastructure tracks have higher pass-through rates at most stages, partly reflecting their smaller, more self-selected applicant pools.
  3. Stream-side review keeps most who reach Stage 3 for the "engaged" outcome (invited to work-test or interview), but only ~30% of Stage-3 entrants end up ranked. The waitlist is doing the volume work.
🔧 Debug — how the data was interpreted (click to expand; safe to skip)

Sample. Canonical 10.0 sample (deduped, n=2,203). Per-track masks based on [stage-1-track] Selected tracks.

Outcome variable(s). Stages derived from Furthest stage reached: Stage 1 / Stage 2 / Stage 3 / Offered / Waitlisted. 'Engaged by ≥1 stream' = is_invited_to_worktest (includes Megastream takehome path + ranking).

Predictor fields. N/A — descriptive.

Filters applied. Canonical dedup applied (7 person_ids had 2 rows in apps_10; kept the row with furthest stage).

Missing-data handling. No imputation needed; Furthest stage reached is non-null for all rows.

Key assumptions / caveats.