A4 — Special advances

Context

Most Stage-2 → Stage-3 advancement was decided by the composite score: top scorers got through, others didn't. But three non-standard pathways also got people into Stage 3 even if they didn't clear the composite bar. This analysis asks whether those pathways added value — i.e., did the people advanced through them perform comparably to people advanced through the regular composite-based pathway?

MATS (Machine Alignment, Transparency & Security) is an AI safety research fellowship that places ~120 fellows with ~100 mentors per cohort. Cohort 10.0 ran in summer 2026 and was the first cohort with a centralized application review instead of decentralized stream-specific review. This analysis is part of a broader effort to evaluate the 10.0 process and inform the design of 11.0 (autumn 2026).

How the 10.0 selection pipeline worked (click to expand)

The 10.0 pipeline in brief. ~2,200 people applied. Each applicant went through three stages:

Stage 1 — submitted background / experience / motivation, picked which research tracks they were interested in (Empirical, Policy & Strategy, Technical Governance, Theory, Compute Infrastructure). An LLM screen filtered out applicants who clearly didn't meet a minimum bar, and produced advisory per-stream recommendations.
Stage 2 — applicants who passed Stage 1 had their materials scored by LLM-graded rubrics. The empirical track used a composite score combining Research Skills, Technical Execution (split into MLE, SWE, Math sub-scores), and Soft Skills. The top ~600 by composite advanced to Stage 3.
Stage 3 — applicants chose specific mentors / "streams" to apply to. Each stream reviewed its applicants and produced a ranked list. Top-ranked applicants got offers; lower-ranked got waitlisted. ~120 offers were made.

For the empirical track, the composite formula is 0.50·RS + 0.35·TE + 0.15·SS, where TE = 0.50·MLE + 0.30·SWE + 0.20·Math. A "relevance multiplier" (Direct=1.0 / Adjacent=0.85 / Distant=0.60) is applied to Research Skills based on how the applicant's experience matches the streams they applied to.

Outcome definitions (click to expand)

Outcome definitions used throughout these analyses:

is_ranked (primary outcome) — applicant was ranked by ≥1 stream. This is the cleanest signal of "the selection process picked this person." Not the same as "received an offer" — offer count is bounded by cohort size (~120), but rank count reflects quality independently of capacity.
is_invited_to_worktest (secondary outcome) — applicant was engaged by ≥1 stream in any way: invited to a work test, invited to an interview, ranked, or sent the Megastream takehome. Strict superset of is_ranked. One level above is_ranked in the funnel.
passed_mentors_bar — applicant was offered or waitlisted. In 10.0, this equals is_ranked exactly (every ranked person got either an offer or a waitlist slot).

The three special pathways

A4a — CodeSignal special advances (n=14). Applicants originally rejected at Stage 2 but advanced because they scored top-10% on the CodeSignal coding test AND were in the top 10% of composite scores among the rejected pool. Idea: if someone is rejected on composite but technically very strong, give them a second look.
A4b — Neel Nanda trainee audit (n=23). Neel Nanda runs a parallel pre-MATS interpretability training program. 35 of his trainees also applied through the regular MATS pipeline; we could match 23 by email. Per Neel's own audit, 9 passed Stage 2 normally and 14 did not. Question: when the regular pipeline disagrees with Neel's hand-selection, who's right?
A4c — Stream-rec topped-ups (n=133). Applicants who failed the composite bar but the Stage-1 LLM had tagged 'Strong advance' for specific streams. We let them through to those specific streams only. Includes 15 'Nanda-only' topped-ups (Neel exploration-phase participants — separate from the 23 above; those went through Neel's process).

Headline

A4a: 5/14 CodeSignal special-advances were ranked (36%, 95% CI [16%, 61%]). Compare to bottom-14 regularly-advanced: 0/14 (0%).

A4b Group B: 8/14 Neel trainees who did NOT pass Stage 2 were ranked (57%). Their composite distribution sits clearly below the Stage-2→Stage-3 cutoff (0.00), with mean 2.12. Group B's composites are NOT extreme outliers above the cutoff — i.e., the composite is not 'obviously wrong' in marking them as filter-outs.

A4c: Non-Nanda topped-ups ranked at 13/118 (11%). Nanda-only topped-ups ranked at 9/15 (60%) (but note: Nanda runs a separate process — these may have been ranked by other streams or not entered Nanda's process at all).

Group-level summary

A4a — CodeSignal top-10% special advances (n = 14)

Goal: did this non-standard pathway add value, or did we let in a batch of false-positives?

Group	n	Ranked (95% CI)	Invited (95% CI)	Offered/WL	Mean composite
All empirical Stage 3 (reference)	791	147/791 (19%) [16%, 21%]	393/791 (50%) [46%, 53%]	147/791 (19%)	2.46
Bottom 14 of regularly-advanced empirical Stage 3 (composite-ordered, no special)	14	0/14 (0%) [nan%, nan%]	5/14 (36%) [nan%, nan%]	0/14 (0%)	0.06
A4a — CodeSignal top-10% special advances	14	5/14 (36%) [16%, 61%]	7/14 (50%) [27%, 73%]	5/14 (36%)	2.55
Rest of empirical Stage-2-rejected pool (no special advance)	892	0/892 (0%) [0%, 0%]	11/892 (1%) [1%, 2%]	0/892 (0%)	1.78
A4b — All Neel trainees in 10.0 pipeline	23	13/23 (57%) [37%, 74%]	17/23 (74%) [54%, 87%]	13/23 (57%)	2.42
A4b — Neel trainees that passed Stage 2 (Group A, n=9)	9	5/9 (56%) [27%, 81%]	8/9 (89%) [57%, 98%]	5/9 (56%)	2.87
A4b — Neel trainees that did NOT pass Stage 2 (Group B, n=14)	14	8/14 (57%) [33%, 79%]	9/14 (64%) [39%, 84%]	8/14 (57%)	2.12
A4c — Stream-rec topped-up (any, incl. Nanda)	133	22/133 (17%) [11%, 24%]	63/133 (47%) [39%, 56%]	22/133 (17%)	2.22
A4c — Stream-rec topped-up (non-Nanda only)	118	13/118 (11%) [7%, 18%]	53/118 (45%) [36%, 54%]	13/118 (11%)	2.25
A4c — Stream-rec topped-up (Nanda-only)	15	9/15 (60%) [36%, 80%]	10/15 (67%) [42%, 85%]	9/15 (60%)	1.98

The benchmark is the marginal regularly-advanced applicant. If specials rank at a similar rate to the bottom of the regular advances, the pathway is roughly break-even. If they rank lower, it cost us; if higher, it gained us.

A4b — Neel trainee audit (n = 23)

Two questions: 1. Of Neel's 23 trainees in the regular pipeline, how many made it through? Group A (9 passed Stage 2): 5 ranked. Group B (14 failed Stage 2): 8 ranked (necessarily via other paths — could be Neel's own process). 2. Did the composite catch them or miss them? See box plot below — Group B sits clearly below the Stage-3 cutoff. The composite is identifying them as below bar, not as borderline misses.

Neel's selection ≠ the composite's selection. Whether that's because the composite undervalues something Neel sees (or vice versa) needs a 3-way look against Neel's own ranking — out of scope here.

A4c — Stream-rec topped-up (n = 133)

Topped-ups split: - Non-Nanda (118): ranked at 11%. - Nanda-only (15): ranked at 60% (Nanda runs a separate process; this counts ranks from any stream).

For non-Nanda topped-ups, compare against the broader Stage-3 empirical ranked rate to gauge whether this stream-rec pathway selects for applicants the composite missed.