D6 — SRP/FRP signal

Context

Each cohort grades fellows on a research plan they produce during the program — 7.0 and 8.0 call it the SRP (Scholar Research Plan); 9.0 calls it the FRP (Final Research Plan). The plans are scored by reviewers. Are these scores a useful program-quality signal? Specifically: do they correlate with mentor evaluations? With post-program publications?

If they correlate, SRP/FRP is a usable intermediate quality metric and worth investing in. If they don't, the SRP/FRP grading is essentially noise and the program should reconsider whether it's worth the cost.

MATS (Machine Alignment, Transparency & Security) is an AI safety research fellowship that places ~120 fellows with ~100 mentors per cohort. Cohort 10.0 ran in summer 2026 and was the first cohort with a centralized application review instead of decentralized stream-specific review. This analysis is part of a broader effort to evaluate the 10.0 process and inform the design of 11.0 (autumn 2026).

How the 10.0 selection pipeline worked (click to expand)

The 10.0 pipeline in brief. ~2,200 people applied. Each applicant went through three stages:

Stage 1 — submitted background / experience / motivation, picked which research tracks they were interested in (Empirical, Policy & Strategy, Technical Governance, Theory, Compute Infrastructure). An LLM screen filtered out applicants who clearly didn't meet a minimum bar, and produced advisory per-stream recommendations.
Stage 2 — applicants who passed Stage 1 had their materials scored by LLM-graded rubrics. The empirical track used a composite score combining Research Skills, Technical Execution (split into MLE, SWE, Math sub-scores), and Soft Skills. The top ~600 by composite advanced to Stage 3.
Stage 3 — applicants chose specific mentors / "streams" to apply to. Each stream reviewed its applicants and produced a ranked list. Top-ranked applicants got offers; lower-ranked got waitlisted. ~120 offers were made.

For the empirical track, the composite formula is 0.50·RS + 0.35·TE + 0.15·SS, where TE = 0.50·MLE + 0.30·SWE + 0.20·Math. A "relevance multiplier" (Direct=1.0 / Adjacent=0.85 / Distant=0.60) is applied to Research Skills based on how the applicant's experience matches the streams they applied to.

Outcome definitions (click to expand)

Outcome definitions used throughout these analyses:

is_ranked (primary outcome) — applicant was ranked by ≥1 stream. This is the cleanest signal of "the selection process picked this person." Not the same as "received an offer" — offer count is bounded by cohort size (~120), but rank count reflects quality independently of capacity.
is_invited_to_worktest (secondary outcome) — applicant was engaged by ≥1 stream in any way: invited to a work test, invited to an interview, ranked, or sent the Megastream takehome. Strict superset of is_ranked. One level above is_ranked in the funnel.
passed_mentors_bar — applicant was offered or waitlisted. In 10.0, this equals is_ranked exactly (every ranked person got either an offer or a waitlist slot).

SRP/FRP rubric differences

7.0 SRP — ToC + Activities scored.
8.0 SRP — ToC + Progress + Activities scored.
9.0 FRP — research success, AI risk reduction, researcher ability, symposium impact. Quite different rubric.

Cross-cohort comparison of raw scores doesn't make sense; we look at rank-based correlations only.

Cohort	n joined	ρ(SRP/FRP, mentor composite)
7.0	72	+0.23
8.0	98	+0.11
9.0	86	+0.06

Cohort	n	ρ(SRP/FRP, has_pub)	ρ(SRP/FRP, n_pubs)
7.0	76	+0.08	+0.03
8.0	100	-0.03	-0.04
9.0	87	+0.04	+0.04

D6 — Is SRP/FRP a useful program-quality signal?

Context

SRP/FRP final score × mentor-eval composite

SRP/FRP final score × post-program publications

Takeaways