Each cohort grades fellows on a research plan they produce during the program — 7.0 and 8.0 call it the SRP (Scholar Research Plan); 9.0 calls it the FRP (Final Research Plan). The plans are scored by reviewers. Are these scores a useful program-quality signal? Specifically: do they correlate with mentor evaluations? With post-program publications?
If they correlate, SRP/FRP is a usable intermediate quality metric and worth investing in. If they don't, the SRP/FRP grading is essentially noise and the program should reconsider whether it's worth the cost.
MATS (Machine Alignment, Transparency & Security) is an AI safety research fellowship that places ~120 fellows with ~100 mentors per cohort. Cohort 10.0 ran in summer 2026 and was the first cohort with a centralized application review instead of decentralized stream-specific review. This analysis is part of a broader effort to evaluate the 10.0 process and inform the design of 11.0 (autumn 2026).
The 10.0 pipeline in brief. ~2,200 people applied. Each applicant went through three stages:
For the empirical track, the composite formula is 0.50·RS + 0.35·TE + 0.15·SS, where TE = 0.50·MLE + 0.30·SWE + 0.20·Math. A "relevance multiplier" (Direct=1.0 / Adjacent=0.85 / Distant=0.60) is applied to Research Skills based on how the applicant's experience matches the streams they applied to.
Outcome definitions used throughout these analyses:
is_ranked (primary outcome) — applicant was ranked by ≥1 stream. This is the cleanest signal of "the selection process picked this person." Not the same as "received an offer" — offer count is bounded by cohort size (~120), but rank count reflects quality independently of capacity.is_invited_to_worktest (secondary outcome) — applicant was engaged by ≥1 stream in any way: invited to a work test, invited to an interview, ranked, or sent the Megastream takehome. Strict superset of is_ranked. One level above is_ranked in the funnel.passed_mentors_bar — applicant was offered or waitlisted. In 10.0, this equals is_ranked exactly (every ranked person got either an offer or a waitlist slot).Cross-cohort comparison of raw scores doesn't make sense; we look at rank-based correlations only.
| Cohort | n joined | ρ(SRP/FRP, mentor composite) |
|---|---|---|
| 7.0 | 72 | +0.23 |
| 8.0 | 98 | +0.11 |
| 9.0 | 86 | +0.06 |
| Cohort | n | ρ(SRP/FRP, has_pub) | ρ(SRP/FRP, n_pubs) |
|---|---|---|---|
| 7.0 | 76 | +0.08 | +0.03 |
| 8.0 | 100 | -0.03 | -0.04 |
| 9.0 | 87 | +0.04 | +0.04 |
Sample. Per cohort: SRP/FRP per-person final scores (averaged across team rows) joined to (a) mentor-eval composite (also averaged per person), and (b) alumni publication tracker.
Outcome variable(s). Mentor-eval composite + post-program publication outcomes.
Predictor fields. SRP/FRP final_score — within-cohort raw score (different rubrics across cohorts; only rank-correlations reported).
Filters applied. Inner join on person_id.
Missing-data handling. Listwise drop per correlation.
Key assumptions / caveats.