B8 — Do returning applicants do better than first-timers?

Context

569 of ~2,200 (≈26%) 10.0 applicants self-identified as returning applicants — i.e., they had applied to MATS in a previous cohort. Do they perform better than first-time applicants? Possible reasons they might: prior experience with the application form / process, more developed AI safety thinking, or self-selection (people who got close last time are more likely to re-apply). Reasons they might not: if they got rejected last time and circumstances haven't changed much, they may face the same outcome.

This analysis is descriptive — full-pool and Stage-3 empirical conditional comparison. We don't know which specific prior cohort they returned from (the form just asks Yes/No).

MATS (Machine Alignment, Transparency & Security) is an AI safety research fellowship that places ~120 fellows with ~100 mentors per cohort. Cohort 10.0 ran in summer 2026 and was the first cohort with a centralized application review instead of decentralized stream-specific review. This analysis is part of a broader effort to evaluate the 10.0 process and inform the design of 11.0 (autumn 2026).

How the 10.0 selection pipeline worked (click to expand)

The 10.0 pipeline in brief. ~2,200 people applied. Each applicant went through three stages:

  1. Stage 1 — submitted background / experience / motivation, picked which research tracks they were interested in (Empirical, Policy & Strategy, Technical Governance, Theory, Compute Infrastructure). An LLM screen filtered out applicants who clearly didn't meet a minimum bar, and produced advisory per-stream recommendations.
  2. Stage 2 — applicants who passed Stage 1 had their materials scored by LLM-graded rubrics. The empirical track used a composite score combining Research Skills, Technical Execution (split into MLE, SWE, Math sub-scores), and Soft Skills. The top ~600 by composite advanced to Stage 3.
  3. Stage 3 — applicants chose specific mentors / "streams" to apply to. Each stream reviewed its applicants and produced a ranked list. Top-ranked applicants got offers; lower-ranked got waitlisted. ~120 offers were made.

For the empirical track, the composite formula is 0.50·RS + 0.35·TE + 0.15·SS, where TE = 0.50·MLE + 0.30·SWE + 0.20·Math. A "relevance multiplier" (Direct=1.0 / Adjacent=0.85 / Distant=0.60) is applied to Research Skills based on how the applicant's experience matches the streams they applied to.

Outcome definitions (click to expand)

Outcome definitions used throughout these analyses:

Headline

In the full pool, returning applicants are ranked at 15.2% vs 6.2% for first-timers — about 2.4× the rate.

But conditional on reaching Stage 3 empirical, the gap narrows substantially: returning 25.9% vs first-time 14.6%. Returning applicants are more likely to clear the early gates but, once they're in Stage 3, the stream-side selection doesn't see them very differently.

Summary

Sample Group n n ranked P(ranked) [95% CI] Mean composite
Full pool Returning 567 86 15.2% [12.5%, 18.4%] 1.93
Full pool First-time 1,633 102 6.2% [5.2%, 7.5%] 1.51
Stage-3 empirical Returning 278 72 25.9% [21.1%, 31.4%] 2.51
Stage-3 empirical First-time 513 75 14.6% [11.8%, 17.9%] 2.44

Composite distributions

Returning applicants have a noticeably higher mean composite (1.93 vs 1.51 for first-timers). This is consistent with the funnel pattern: they pass Stage 2 more reliably, which is what the composite gates.

Takeaways

  1. Returning applicants outperform first-timers in the full pool — they're more likely to be ranked and more likely to get an offer.
  2. Most of the gap is driven by clearing earlier stages, not by the stream-side judgement at Stage 3. Once both groups are in Stage 3, the difference narrows.
  3. Possible mechanisms: returning applicants self-select (rejected applicants without growth in profile may not re-apply); they may have better-developed AI safety thinking by their second attempt; and they may simply know how to write a stronger Stage-1 application having seen one before.
  4. For 11.0: it's worth tracking returning applicants more carefully — knowing which specific prior cohort they returned from, and whether they were ranked then, would help disentangle self-selection from genuine year-over-year improvement.
🔧 Debug — how the data was interpreted (click to expand; safe to skip)

Sample. Canonical 10.0 sample. Returning: 567. First-time: 1,633. 3 missing 'returning?' answers excluded.

Outcome variable(s). is_ranked (primary), is_invited_to_worktest, passed_mentors_bar.

Predictor fields. [stage-1-basic] Returning applicant? (Yes/No).

Filters applied. Canonical dedup.

Missing-data handling. NaN returning-status (n=3) excluded.

Key assumptions / caveats.