The most downstream outcome we can measure for fellows is post-program publications. The alumni-publication tracker covers cohorts 5.0–9.0 (584 entries) with paper counts and citation counts per fellow. We join to application data and ask: which application features predict whether a fellow goes on to publish?
Note: this is far downstream of selection. Many factors between getting into MATS and publishing — mentor fit, project, post-MATS opportunities — affect the outcome. We expect low R² / modest AUC, and 'publication' is a narrow outcome that doesn't capture all kinds of impact.
MATS (Machine Alignment, Transparency & Security) is an AI safety research fellowship that places ~120 fellows with ~100 mentors per cohort. Cohort 10.0 ran in summer 2026 and was the first cohort with a centralized application review instead of decentralized stream-specific review. This analysis is part of a broader effort to evaluate the 10.0 process and inform the design of 11.0 (autumn 2026).
The 10.0 pipeline in brief. ~2,200 people applied. Each applicant went through three stages:
For the empirical track, the composite formula is 0.50·RS + 0.35·TE + 0.15·SS, where TE = 0.50·MLE + 0.30·SWE + 0.20·Math. A "relevance multiplier" (Direct=1.0 / Adjacent=0.85 / Distant=0.60) is applied to Research Skills based on how the applicant's experience matches the streams they applied to.
Outcome definitions used throughout these analyses:
is_ranked (primary outcome) — applicant was ranked by ≥1 stream. This is the cleanest signal of "the selection process picked this person." Not the same as "received an offer" — offer count is bounded by cohort size (~120), but rank count reflects quality independently of capacity.is_invited_to_worktest (secondary outcome) — applicant was engaged by ≥1 stream in any way: invited to a work test, invited to an interview, ranked, or sent the Megastream takehome. Strict superset of is_ranked. One level above is_ranked in the funnel.passed_mentors_bar — applicant was offered or waitlisted. In 10.0, this equals is_ranked exactly (every ranked person got either an offer or a waitlist slot).| Latest cohort | n alumni-pub rows | with ≥1 pub | median n_pubs | mean citations |
|---|---|---|---|---|
| 5.0 | 26 | 14 | 1 | 41.7 |
| 5.1 | 25 | 19 | 1 | 45.9 |
| 6.0 | 42 | 26 | 1 | 33.0 |
| 6.1 | 33 | 31 | 3 | 37.8 |
| 7.0 | 27 | 19 | 2 | 10.3 |
| 7.1 | 46 | 37 | 2 | 18.9 |
| 8.0 | 21 | 10 | 0 | 0.9 |
| 8.1 | 75 | 47 | 1 | 8.3 |
| 9.0 | 87 | 12 | 0 | 2.2 |
| Cohort | n | with pub | AUC | 95% CI |
|---|---|---|---|---|
| 6.0 | 42 | 26 | 0.724 | [0.544, 0.883] |
| 7.0 | 26 | 18 | 0.757 | [0.542, 0.944] |
| 8.0 | 21 | 10 | 0.777 | [0.564, 0.969] |
| 9.0 | 87 | 13 | 0.753 | [0.593, 0.882] |
Sample. Per cohort: alumni-pub rows whose latest cohort matches, joined to apps via person_id. Listwise-complete on cohort-specific features.
Outcome variable(s). has_post_pub (1 if ≥1 publication tracked).
Predictor fields. Per-cohort: CodeSignal, education ordinal, # bg-tier items (Familiar/Applied/Expert). 8.0/9.0: + centralized review scores.
Filters applied. Multi-cohort fellows attributed to latest cohort.
Missing-data handling. Listwise drop on features.
Key assumptions / caveats.
n_pubs parsed from comma-separated title list — assumes titles aren't themselves comma-rich. Spot-checked OK but not perfect.