A4 — Did the special-advance pathways add value?

Context

Most Stage-2 → Stage-3 advancement was decided by the composite score: top scorers got through, others didn't. But three non-standard pathways also got people into Stage 3 even if they didn't clear the composite bar. This analysis asks whether those pathways added value — i.e., did the people advanced through them perform comparably to people advanced through the regular composite-based pathway?

MATS (Machine Alignment, Transparency & Security) is an AI safety research fellowship that places ~120 fellows with ~100 mentors per cohort. Cohort 10.0 ran in summer 2026 and was the first cohort with a centralized application review instead of decentralized stream-specific review. This analysis is part of a broader effort to evaluate the 10.0 process and inform the design of 11.0 (autumn 2026).

How the 10.0 selection pipeline worked (click to expand)

The 10.0 pipeline in brief. ~2,200 people applied. Each applicant went through three stages:

  1. Stage 1 — submitted background / experience / motivation, picked which research tracks they were interested in (Empirical, Policy & Strategy, Technical Governance, Theory, Compute Infrastructure). An LLM screen filtered out applicants who clearly didn't meet a minimum bar, and produced advisory per-stream recommendations.
  2. Stage 2 — applicants who passed Stage 1 had their materials scored by LLM-graded rubrics. The empirical track used a composite score combining Research Skills, Technical Execution (split into MLE, SWE, Math sub-scores), and Soft Skills. The top ~600 by composite advanced to Stage 3.
  3. Stage 3 — applicants chose specific mentors / "streams" to apply to. Each stream reviewed its applicants and produced a ranked list. Top-ranked applicants got offers; lower-ranked got waitlisted. ~120 offers were made.

For the empirical track, the composite formula is 0.50·RS + 0.35·TE + 0.15·SS, where TE = 0.50·MLE + 0.30·SWE + 0.20·Math. A "relevance multiplier" (Direct=1.0 / Adjacent=0.85 / Distant=0.60) is applied to Research Skills based on how the applicant's experience matches the streams they applied to.

Outcome definitions (click to expand)

Outcome definitions used throughout these analyses:

The three special pathways

Headline

A4a: 5/14 CodeSignal special-advances were ranked (36%, 95% CI [16%, 61%]). Compare to bottom-14 regularly-advanced: 0/14 (0%).

A4b Group B: 8/14 Neel trainees who did NOT pass Stage 2 were ranked (57%). Their composite distribution sits clearly below the Stage-2→Stage-3 cutoff (0.00), with mean 2.12. Group B's composites are NOT extreme outliers above the cutoff — i.e., the composite is not 'obviously wrong' in marking them as filter-outs.

A4c: Non-Nanda topped-ups ranked at 13/118 (11%). Nanda-only topped-ups ranked at 9/15 (60%) (but note: Nanda runs a separate process — these may have been ranked by other streams or not entered Nanda's process at all).

Group-level summary

Group n Ranked (95% CI) Invited (95% CI) Offered/WL Mean composite
All empirical Stage 3 (reference) 791 147/791 (19%) [16%, 21%] 393/791 (50%) [46%, 53%] 147/791 (19%) 2.46
Bottom 14 of regularly-advanced empirical Stage 3 (composite-ordered, no special) 14 0/14 (0%) [nan%, nan%] 5/14 (36%) [nan%, nan%] 0/14 (0%) 0.06
A4a — CodeSignal top-10% special advances 14 5/14 (36%) [16%, 61%] 7/14 (50%) [27%, 73%] 5/14 (36%) 2.55
Rest of empirical Stage-2-rejected pool (no special advance) 892 0/892 (0%) [0%, 0%] 11/892 (1%) [1%, 2%] 0/892 (0%) 1.78
A4b — All Neel trainees in 10.0 pipeline 23 13/23 (57%) [37%, 74%] 17/23 (74%) [54%, 87%] 13/23 (57%) 2.42
A4b — Neel trainees that passed Stage 2 (Group A, n=9) 9 5/9 (56%) [27%, 81%] 8/9 (89%) [57%, 98%] 5/9 (56%) 2.87
A4b — Neel trainees that did NOT pass Stage 2 (Group B, n=14) 14 8/14 (57%) [33%, 79%] 9/14 (64%) [39%, 84%] 8/14 (57%) 2.12
A4c — Stream-rec topped-up (any, incl. Nanda) 133 22/133 (17%) [11%, 24%] 63/133 (47%) [39%, 56%] 22/133 (17%) 2.22
A4c — Stream-rec topped-up (non-Nanda only) 118 13/118 (11%) [7%, 18%] 53/118 (45%) [36%, 54%] 13/118 (11%) 2.25
A4c — Stream-rec topped-up (Nanda-only) 15 9/15 (60%) [36%, 80%] 10/15 (67%) [42%, 85%] 9/15 (60%) 1.98

A4a — CodeSignal top-10% special advances (n = 14)

Goal: did this non-standard pathway add value, or did we let in a batch of false-positives?

The benchmark is the marginal regularly-advanced applicant. If specials rank at a similar rate to the bottom of the regular advances, the pathway is roughly break-even. If they rank lower, it cost us; if higher, it gained us.

A4b — Neel trainee audit (n = 23)

Two questions: 1. Of Neel's 23 trainees in the regular pipeline, how many made it through? Group A (9 passed Stage 2): 5 ranked. Group B (14 failed Stage 2): 8 ranked (necessarily via other paths — could be Neel's own process). 2. Did the composite catch them or miss them? See box plot below — Group B sits clearly below the Stage-3 cutoff. The composite is identifying them as below bar, not as borderline misses.

Neel's selection ≠ the composite's selection. Whether that's because the composite undervalues something Neel sees (or vice versa) needs a 3-way look against Neel's own ranking — out of scope here.

A4c — Stream-rec topped-up (n = 133)

Topped-ups split: - Non-Nanda (118): ranked at 11%. - Nanda-only (15): ranked at 60% (Nanda runs a separate process; this counts ranks from any stream).

For non-Nanda topped-ups, compare against the broader Stage-3 empirical ranked rate to gauge whether this stream-rec pathway selects for applicants the composite missed.

Takeaways

  1. CodeSignal specials added clear value. They were ranked at ~36% vs ~0% for the bottom-14 of regular advances — a substantial difference even with n=14.
  2. Neel Group B people were correctly identified as below-bar by the composite (their attribute distributions clearly sit below the Stage-2→Stage-3 cutoff). The composite isn't 'missing' them in the technical sense. Whether they're missed in the talent sense depends on Neel's-side selection criteria, which the composite isn't trying to capture.
  3. Non-Nanda stream-rec topped-ups ranked at ~11% — above the floor (Stage-2-rejected: 0%) but below regular Stage-3 advance rates (~19%). The pathway is doing something positive but modestly. For 11.0 we should think about how to make the LLM stream-rec topped-up criteria more selective.
🔧 Debug — how the data was interpreted (click to expand; safe to skip)

Sample. Canonical 10.0 sample (deduped). Comparison groups:
- All empirical Stage 3 (reference)
- Bottom 14 of regularly-advanced empirical Stage 3 (composite-ordered, non-special) — for A4a's 'are specials worse than the bottom of the regular pool?'
- Empirical-track Stage-2-rejected pool (excluding specials) — for the broader comparison
- Neel trainees: 9 passed Stage 2 (Group A) / 14 did not (Group B) per neel_fellows_audit.md
- Stream-rec topped-ups: split into non-Nanda and Nanda-only
Neel trainee person_ids resolved via identity map; 23/23 matched.

Outcome variable(s). Primary: is_ranked. Secondary: is_invited_to_worktest. Tertiary: passed_mentors_bar (offered or waitlisted).

Predictor fields. Group membership is the predictor. Composite scores reported descriptively for each group.

Filters applied. Canonical dedup applied. Nanda is not excluded from the topped-up set — instead we split topped-up rows into Nanda-only vs. non-Nanda so the reader can see Nanda's behavior separately. Per memory rule, Nanda is excluded from per-stream analyses but kept in pool-level.

Missing-data handling. No imputation. n reported per group.

Key assumptions / caveats.