MATS has changed its selection process across cohorts: 7.0 and 8.0 were fully decentralized (each stream reviewed its own applicants), 9.0 added partial centralized review on top, and 10.0 went fully centralized. Did selection quality improve as a result?
This is a hard question because we can't observe quality directly — we observe proxies (mentor evals, SRP/FRP scores, post-program publications), each measured with different instruments across cohorts. We compare relative shape rather than raw levels, and flag caveats throughout.
MATS (Machine Alignment, Transparency & Security) is an AI safety research fellowship that places ~120 fellows with ~100 mentors per cohort. Cohort 10.0 ran in summer 2026 and was the first cohort with a centralized application review instead of decentralized stream-specific review. This analysis is part of a broader effort to evaluate the 10.0 process and inform the design of 11.0 (autumn 2026).
The 10.0 pipeline in brief. ~2,200 people applied. Each applicant went through three stages:
For the empirical track, the composite formula is 0.50·RS + 0.35·TE + 0.15·SS, where TE = 0.50·MLE + 0.30·SWE + 0.20·Math. A "relevance multiplier" (Direct=1.0 / Adjacent=0.85 / Distant=0.60) is applied to Research Skills based on how the applicant's experience matches the streams they applied to.
Outcome definitions used throughout these analyses:
is_ranked (primary outcome) — applicant was ranked by ≥1 stream. This is the cleanest signal of "the selection process picked this person." Not the same as "received an offer" — offer count is bounded by cohort size (~120), but rank count reflects quality independently of capacity.is_invited_to_worktest (secondary outcome) — applicant was engaged by ≥1 stream in any way: invited to a work test, invited to an interview, ranked, or sent the Megastream takehome. Strict superset of is_ranked. One level above is_ranked in the funnel.passed_mentors_bar — applicant was offered or waitlisted. In 10.0, this equals is_ranked exactly (every ranked person got either an offer or a waitlist slot).| Cohort | n | Mean | Median | P25 | P75 | Frac ≥8/10 |
|---|---|---|---|---|---|---|
| 6.0 | 95 | 7.20 | 7.25 | 6.38 | 8.12 | 22% |
| 7.0 | 76 | 7.34 | 7.38 | 6.50 | 8.00 | 33% |
| 8.0 | 106 | 7.16 | 7.25 | 6.56 | 7.75 | 25% |
| 9.0 | 93 | 7.52 | 7.50 | 6.75 | 8.25 | 40% |
| Latest cohort | n | P(has ≥1 pub) | Median n_pubs |
|---|---|---|---|
| 5.0 | 26 | 54% | 1 |
| 5.1 | 25 | 76% | 1 |
| 6.0 | 42 | 62% | 1 |
| 6.1 | 33 | 94% | 3 |
| 7.0 | 27 | 70% | 2 |
| 7.1 | 46 | 80% | 2 |
| 8.0 | 21 | 48% | 0 |
| 8.1 | 75 | 63% | 1 |
| 9.0 | 87 | 14% | 0 |
Heavy recency confound here — 6.0 alumni have had ~2 years to publish, 9.0 only ~6 months. The apparent decline in publication rate is largely time-driven, not quality-driven.
Raw cohort means: 7.0=78.3, 8.0=79.8, 9.0=2.8. Cross-cohort comparison of raw scores is not meaningful — different rubrics. Within-cohort percentile is what we use for cross-cohort analyses elsewhere (e.g., C2, C3 implicitly).
Sample. Mentor evals: per-cohort all rows in mentor_X table. SRP/FRP: per-cohort table. Publication rates: alumni_pubs by latest-cohort attribution.
Outcome variable(s). Mentor: mean of four standardized dimensions (domain skill, research execution, AI safety knowledge, mission alignment). SRP/FRP: raw final_score. Publication: has_pub binary, n_pubs count.
Predictor fields. N/A — descriptive cross-cohort comparison.
Filters applied. None beyond per-table standard load.
Missing-data handling. No imputation; null rows excluded from per-cohort summaries.
Key assumptions / caveats.