Most Stage-2 → Stage-3 advancement was decided by the composite score: top scorers got through, others didn't. But three non-standard pathways also got people into Stage 3 even if they didn't clear the composite bar. This analysis asks whether those pathways added value — i.e., did the people advanced through them perform comparably to people advanced through the regular composite-based pathway?
MATS (Machine Alignment, Transparency & Security) is an AI safety research fellowship that places ~120 fellows with ~100 mentors per cohort. Cohort 10.0 ran in summer 2026 and was the first cohort with a centralized application review instead of decentralized stream-specific review. This analysis is part of a broader effort to evaluate the 10.0 process and inform the design of 11.0 (autumn 2026).
The 10.0 pipeline in brief. ~2,200 people applied. Each applicant went through three stages:
For the empirical track, the composite formula is 0.50·RS + 0.35·TE + 0.15·SS, where TE = 0.50·MLE + 0.30·SWE + 0.20·Math. A "relevance multiplier" (Direct=1.0 / Adjacent=0.85 / Distant=0.60) is applied to Research Skills based on how the applicant's experience matches the streams they applied to.
Outcome definitions used throughout these analyses:
is_ranked (primary outcome) — applicant was ranked by ≥1 stream. This is the cleanest signal of "the selection process picked this person." Not the same as "received an offer" — offer count is bounded by cohort size (~120), but rank count reflects quality independently of capacity.is_invited_to_worktest (secondary outcome) — applicant was engaged by ≥1 stream in any way: invited to a work test, invited to an interview, ranked, or sent the Megastream takehome. Strict superset of is_ranked. One level above is_ranked in the funnel.passed_mentors_bar — applicant was offered or waitlisted. In 10.0, this equals is_ranked exactly (every ranked person got either an offer or a waitlist slot).A4a: 5/14 CodeSignal special-advances were ranked (36%, 95% CI [16%, 61%]). Compare to bottom-14 regularly-advanced: 0/14 (0%).
A4b Group B: 8/14 Neel trainees who did NOT pass Stage 2 were ranked (57%). Their composite distribution sits clearly below the Stage-2→Stage-3 cutoff (0.00), with mean 2.12. Group B's composites are NOT extreme outliers above the cutoff — i.e., the composite is not 'obviously wrong' in marking them as filter-outs.
A4c: Non-Nanda topped-ups ranked at 13/118 (11%). Nanda-only topped-ups ranked at 9/15 (60%) (but note: Nanda runs a separate process — these may have been ranked by other streams or not entered Nanda's process at all).
| Group | n | Ranked (95% CI) | Invited (95% CI) | Offered/WL | Mean composite |
|---|---|---|---|---|---|
| All empirical Stage 3 (reference) | 791 | 147/791 (19%) [16%, 21%] | 393/791 (50%) [46%, 53%] | 147/791 (19%) | 2.46 |
| Bottom 14 of regularly-advanced empirical Stage 3 (composite-ordered, no special) | 14 | 0/14 (0%) [nan%, nan%] | 5/14 (36%) [nan%, nan%] | 0/14 (0%) | 0.06 |
| A4a — CodeSignal top-10% special advances | 14 | 5/14 (36%) [16%, 61%] | 7/14 (50%) [27%, 73%] | 5/14 (36%) | 2.55 |
| Rest of empirical Stage-2-rejected pool (no special advance) | 892 | 0/892 (0%) [0%, 0%] | 11/892 (1%) [1%, 2%] | 0/892 (0%) | 1.78 |
| A4b — All Neel trainees in 10.0 pipeline | 23 | 13/23 (57%) [37%, 74%] | 17/23 (74%) [54%, 87%] | 13/23 (57%) | 2.42 |
| A4b — Neel trainees that passed Stage 2 (Group A, n=9) | 9 | 5/9 (56%) [27%, 81%] | 8/9 (89%) [57%, 98%] | 5/9 (56%) | 2.87 |
| A4b — Neel trainees that did NOT pass Stage 2 (Group B, n=14) | 14 | 8/14 (57%) [33%, 79%] | 9/14 (64%) [39%, 84%] | 8/14 (57%) | 2.12 |
| A4c — Stream-rec topped-up (any, incl. Nanda) | 133 | 22/133 (17%) [11%, 24%] | 63/133 (47%) [39%, 56%] | 22/133 (17%) | 2.22 |
| A4c — Stream-rec topped-up (non-Nanda only) | 118 | 13/118 (11%) [7%, 18%] | 53/118 (45%) [36%, 54%] | 13/118 (11%) | 2.25 |
| A4c — Stream-rec topped-up (Nanda-only) | 15 | 9/15 (60%) [36%, 80%] | 10/15 (67%) [42%, 85%] | 9/15 (60%) | 1.98 |
Goal: did this non-standard pathway add value, or did we let in a batch of false-positives?
The benchmark is the marginal regularly-advanced applicant. If specials rank at a similar rate to the bottom of the regular advances, the pathway is roughly break-even. If they rank lower, it cost us; if higher, it gained us.
Two questions: 1. Of Neel's 23 trainees in the regular pipeline, how many made it through? Group A (9 passed Stage 2): 5 ranked. Group B (14 failed Stage 2): 8 ranked (necessarily via other paths — could be Neel's own process). 2. Did the composite catch them or miss them? See box plot below — Group B sits clearly below the Stage-3 cutoff. The composite is identifying them as below bar, not as borderline misses.
Neel's selection ≠ the composite's selection. Whether that's because the composite undervalues something Neel sees (or vice versa) needs a 3-way look against Neel's own ranking — out of scope here.
Topped-ups split: - Non-Nanda (118): ranked at 11%. - Nanda-only (15): ranked at 60% (Nanda runs a separate process; this counts ranks from any stream).
For non-Nanda topped-ups, compare against the broader Stage-3 empirical ranked rate to gauge whether this stream-rec pathway selects for applicants the composite missed.
Sample. Canonical 10.0 sample (deduped). Comparison groups:
- All empirical Stage 3 (reference)
- Bottom 14 of regularly-advanced empirical Stage 3 (composite-ordered, non-special) — for A4a's 'are specials worse than the bottom of the regular pool?'
- Empirical-track Stage-2-rejected pool (excluding specials) — for the broader comparison
- Neel trainees: 9 passed Stage 2 (Group A) / 14 did not (Group B) per neel_fellows_audit.md
- Stream-rec topped-ups: split into non-Nanda and Nanda-only
Neel trainee person_ids resolved via identity map; 23/23 matched.
Outcome variable(s). Primary: is_ranked. Secondary: is_invited_to_worktest. Tertiary: passed_mentors_bar (offered or waitlisted).
Predictor fields. Group membership is the predictor. Composite scores reported descriptively for each group.
Filters applied. Canonical dedup applied. Nanda is not excluded from the topped-up set — instead we split topped-up rows into Nanda-only vs. non-Nanda so the reader can see Nanda's behavior separately. Per memory rule, Nanda is excluded from per-stream analyses but kept in pool-level.
Missing-data handling. No imputation. n reported per group.
Key assumptions / caveats.