C6 — Demographics across cohorts

Context

10.0 moved to a more centralized, partially-blinded review process. Did that change demographic outcomes? We track gender and race composition and outcomes across cohorts 7.0–10.0. (6.0 didn't collect demographics, so it's excluded.)

MATS (Machine Alignment, Transparency & Security) is an AI safety research fellowship that places ~120 fellows with ~100 mentors per cohort. Cohort 10.0 ran in summer 2026 and was the first cohort with a centralized application review instead of decentralized stream-specific review. This analysis is part of a broader effort to evaluate the 10.0 process and inform the design of 11.0 (autumn 2026).

How the 10.0 selection pipeline worked (click to expand)

The 10.0 pipeline in brief. ~2,200 people applied. Each applicant went through three stages:

Stage 1 — submitted background / experience / motivation, picked which research tracks they were interested in (Empirical, Policy & Strategy, Technical Governance, Theory, Compute Infrastructure). An LLM screen filtered out applicants who clearly didn't meet a minimum bar, and produced advisory per-stream recommendations.
Stage 2 — applicants who passed Stage 1 had their materials scored by LLM-graded rubrics. The empirical track used a composite score combining Research Skills, Technical Execution (split into MLE, SWE, Math sub-scores), and Soft Skills. The top ~600 by composite advanced to Stage 3.
Stage 3 — applicants chose specific mentors / "streams" to apply to. Each stream reviewed its applicants and produced a ranked list. Top-ranked applicants got offers; lower-ranked got waitlisted. ~120 offers were made.

For the empirical track, the composite formula is 0.50·RS + 0.35·TE + 0.15·SS, where TE = 0.50·MLE + 0.30·SWE + 0.20·Math. A "relevance multiplier" (Direct=1.0 / Adjacent=0.85 / Distant=0.60) is applied to Research Skills based on how the applicant's experience matches the streams they applied to.

Outcome definitions (click to expand)

Outcome definitions used throughout these analyses:

is_ranked (primary outcome) — applicant was ranked by ≥1 stream. This is the cleanest signal of "the selection process picked this person." Not the same as "received an offer" — offer count is bounded by cohort size (~120), but rank count reflects quality independently of capacity.
is_invited_to_worktest (secondary outcome) — applicant was engaged by ≥1 stream in any way: invited to a work test, invited to an interview, ranked, or sent the Megastream takehome. Strict superset of is_ranked. One level above is_ranked in the funnel.
passed_mentors_bar — applicant was offered or waitlisted. In 10.0, this equals is_ranked exactly (every ranked person got either an offer or a waitlist slot).

Caveats

9.0 had 52% missing race per CLAUDE.md. 10.0 collection is better but still ~30%+ missing in places.
Race data has small group sizes — Wilson CIs are wide for underrepresented groups.
'Passed mentors bar' is the same outcome used elsewhere: offered + waitlisted (or accepted-mentor proxy for 7.0/8.0).
This is descriptive — we do NOT model causal effects of demographics on outcomes.

Cohort	Gender	n	n passed	P(passed) [95% CI]
10.0	Man	918	76	8.3% [6.7%, 10.2%]
10.0	Non-binary	25	1	4.0% [0.7%, 19.5%]
10.0	Woman	430	30	7.0% [4.9%, 9.8%]
7.0	Man	377	31	8.2% [5.9%, 11.4%]
7.0	Non-binary	14	2	14.3% [4.0%, 39.9%]
7.0	Woman	126	14	11.1% [6.7%, 17.8%]
8.0	Man	597	42	7.0% [5.2%, 9.4%]
8.0	Non-binary	20	2	10.0% [2.8%, 30.1%]
8.0	Woman	228	12	5.3% [3.0%, 9.0%]
9.0	Man	484	45	9.3% [7.0%, 12.2%]
9.0	Non-binary	11	2	18.2% [5.1%, 47.7%]
9.0	Woman	205	21	10.2% [6.8%, 15.2%]

C6 — Did demographic outcomes change across cohorts?

Context

Gender outcomes (Man / Woman / Non-binary)

Race outcomes (top 6 groups by sample size)

Pool composition by gender

Takeaways