Part B — 10.0 process & design questions

Context

Part A asked whether the central 10.0 selection rubric works (it broadly does). Part B steps through a series of process and design questions that came up during 10.0, each one informing a specific 11.0 decision. 9 analyses, mostly descriptive plus regression where data supports it.

How the 10.0 selection pipeline worked (refresher; click to expand)

~2,200 people applied. Each applicant went through three stages:

  1. Stage 1 — submitted background, picked tracks, took an LLM-graded screen. Some applicants were filtered out here.
  2. Stage 2 — LLM-graded rubric assigns a composite score; top ~600 go to Stage 3.
  3. Stage 3 — applicants choose specific streams (mentor-led projects) to apply to. Streams rank applicants; ~120 offers go out.

For the empirical track, composite = 0.50·RS + 0.35·TE + 0.15·SS with TE = 0.50·MLE + 0.30·SWE + 0.20·Math; RS multiplied by a relevance multiplier (1.0/0.85/0.6 for Direct/Adjacent/Distant).

Headline findings

  1. The funnel is dominated by Stage 2 → Stage 3 filtering. Of 2,203 canonical applications, ~60% of Stage-1-passers are dropped at Stage 2. By the end, 189 (8.6%) are ranked and 126 (5.7%) get offers. B1 results.
  2. ~73% of committed applicants matched with a top-3 preference stream. A substantial minority of applicants reach matches with streams they didn't initially flag at Stage 1 — supporting making Stage-1 stream questions optional / low-friction for 11.0. B2 results.
  3. ToC alignment score is the strongest of the mission-alignment signals. AUC = 0.65 in the full pool, attenuating to ~0.61 in Stage 3. Multi-select AIS-engagement count and free-text duration carry less signal. (Important caveat: the 10.0 AIS form had a bug — secondary detail panels were swapped. The main multi-select is unaffected; this analysis only uses the main field.) B3 results.
  4. "AI safety org" references carry the strongest reference-type signal (+12.4% lift in P(ranked) vs applicants without one). Government / policy / other-industry refs skew negative. B4 results.
  5. The policy/gov rubric works. P&S composite AUC = 0.71, TG composite AUC = 0.75. Analytical Communication is doing real work as the policy-track analog of empirical's Technical Execution. Dual relevance multipliers pull weight. B5 results.
  6. Composite has a strong floor in Stage 3. The bottom several deciles of Stage-3 composite rank at near-zero rates. The curve is concave with diminishing returns above the top decile. Supports using composite as a hard floor rather than just an informational signal. B6 results.
  7. AI-detected text in applications is a modest negative signal. AUC ~0.57 for max fraction_ai → not ranked. Composite is lower for high-fraction_ai applicants, suggesting reviewers are implicitly (or explicitly) discounting AI-written content. B7 results.
  8. Returning applicants outperform first-timers — but mostly via clearing earlier stages. 86/567 returning applicants got ranked (15.2%) vs 102/1633 first-timers (6.2%). Conditional on reaching Stage 3, the gap narrows substantially. B8 results.
  9. "Applying to more streams" is mostly a quality proxy. Stream count correlates +0.43 with composite. After controlling for composite in a joint logit, the marginal effect of applying to more streams is small. No strong case for capping #streams in 11.0. B9 results.

11.0 implications (tentative)

Individual reports

AnalysisQuestionn
B1 — Selection funnelHow many applicants made it to each stage?2,203 canonical
B2 — Stream matching & self-selectionAre people matching with their top stream picks? Do they end up at streams they didn't initially flag?108 committers, 1,064 Stage-3
B3 — Mission alignment signalsDo AIS engagement signals predict ranking?2,206 ToC scores
B4 — Reference signal valueDo references (and their type) predict ranking?~1,600 with ≥1 ref
B5 — Policy/governance pipelineDoes the policy/gov rubric work?P&S: 445, TG: ~250
B6 — Percentile vs progressionFloor + diminishing returns of composite percentile~604 Stage-3 empirical
B7 — Pangram / AI-detectionDid AI-written text in applications hurt outcomes?~2,200 with Pangram signal
B8 — Returning applicantsDo returning applicants do better?569 returning, 1,638 first-time
B9 — Application strategyDoes applying to more streams help?1,064 Stage-3 applicants

Errors encountered during Part B

None unrecovered. One in-flight caveat surfaced: the 10.0 AIS engagement form had a UI bug that swapped the secondary "research program" and "structured course" detail panels (Sanyu flagged at B5 checkpoint). This affects only the secondary detail fields; this analysis (B3) uses the main multi-select count and the duration field, both of which are unaffected. Caveat is documented in B3 and saved to project memory for future analyses.