Many of our individual selection signals are weak in isolation — CodeSignal, ToC alignment, AIS engagement count, research-taste test all carry modest predictive power for ranking (Parts A and B). But maybe they're picking up different aspects of applicant quality. If we count how many of these weak signals an applicant is above-median on, does that count predict ranking better than any single signal alone?
Practical question: if a stream is on the fence about a borderline candidate, would knowing 'they're above median on 4 of 5 weak signals' be useful information beyond composite score alone?
MATS (Machine Alignment, Transparency & Security) is an AI safety research fellowship that places ~120 fellows with ~100 mentors per cohort. Cohort 10.0 ran in summer 2026 and was the first cohort with a centralized application review instead of decentralized stream-specific review. This analysis is part of a broader effort to evaluate the 10.0 process and inform the design of 11.0 (autumn 2026).
The 10.0 pipeline in brief. ~2,200 people applied. Each applicant went through three stages:
For the empirical track, the composite formula is 0.50·RS + 0.35·TE + 0.15·SS, where TE = 0.50·MLE + 0.30·SWE + 0.20·Math. A "relevance multiplier" (Direct=1.0 / Adjacent=0.85 / Distant=0.60) is applied to Research Skills based on how the applicant's experience matches the streams they applied to.
Outcome definitions used throughout these analyses:
is_ranked (primary outcome) — applicant was ranked by ≥1 stream. This is the cleanest signal of "the selection process picked this person." Not the same as "received an offer" — offer count is bounded by cohort size (~120), but rank count reflects quality independently of capacity.is_invited_to_worktest (secondary outcome) — applicant was engaged by ≥1 stream in any way: invited to a work test, invited to an interview, ranked, or sent the Megastream takehome. Strict superset of is_ranked. One level above is_ranked in the funnel.passed_mentors_bar — applicant was offered or waitlisted. In 10.0, this equals is_ranked exactly (every ranked person got either an offer or a waitlist slot).| # signals above median | n | n ranked | P(ranked) |
|---|---|---|---|
| 0 | 91 | 8 | 8.8% |
| 1 | 195 | 18 | 9.2% |
| 2 | 246 | 42 | 17.1% |
| 3 | 161 | 36 | 22.4% |
| 4 | 74 | 27 | 36.5% |
| 5 | 24 | 16 | 66.7% |
The pattern is clearly monotone: applicants with 0–1 above-median signals rank at near-zero rates; applicants with 4–5 above-median signals rank at substantially higher rates.
| Predictor | n | AUC | 95% CI |
|---|---|---|---|
| composite | 791 | 0.678 | [0.631, 0.724] |
| codesignal | 676 | 0.606 | [0.550, 0.658] |
| toc | 791 | 0.612 | [0.561, 0.668] |
| rt | 393 | 0.634 | [0.576, 0.692] |
| ais_count | 791 | 0.572 | [0.521, 0.622] |
| agreement_count | 791 | 0.680 | [0.633, 0.729] |
The agreement count's AUC is similar to (or modestly better than) the composite alone — confirming that the convergence captures real signal not lost by aggregating.
| Group | n | P(ranked) |
|---|---|---|
| Composite above median, ≤1 other signal above | 188 | 16.5% |
| Composite below median, ≥2 other signals above | 170 | 18.2% |
A meaningful share of "composite below median but other signals say yes" applicants still get ranked — modest evidence that the secondary signals add information at the margin.
Sample. Stage-3 empirical pool (n=791).
Outcome variable(s). is_ranked.
Predictor fields. Five binary above-median flags: composite, CodeSignal, ToC alignment, research-taste final, AIS engagement count. Each thresholded at the Stage-3-empirical median. Agreement count = sum of flags (0–5).
Filters applied. Stage-3 empirical filter. Canonical dedup.
Missing-data handling. Each flag's threshold uses non-null median. Missing values do NOT contribute.
Key assumptions / caveats.