B7 — Did AI-detected text in applications hurt outcomes?

Context

Pangram is an AI-text-detection tool. 10.0 ran applicant free-text fields through Pangram and stored a fraction_ai score per field (0 = human-written, 1 = high confidence AI-generated, -1 = not analyzed / not enough text). We can use these scores to ask: how prevalent is AI-generated text in 10.0 applications? Does it correlate with worse Stage-2 scores? With lower P(ranked)? Or do reviewers seem to be (implicitly or explicitly) discounting it?

MATS (Machine Alignment, Transparency & Security) is an AI safety research fellowship that places ~120 fellows with ~100 mentors per cohort. Cohort 10.0 ran in summer 2026 and was the first cohort with a centralized application review instead of decentralized stream-specific review. This analysis is part of a broader effort to evaluate the 10.0 process and inform the design of 11.0 (autumn 2026).

How the 10.0 selection pipeline worked (click to expand)

The 10.0 pipeline in brief. ~2,200 people applied. Each applicant went through three stages:

  1. Stage 1 — submitted background / experience / motivation, picked which research tracks they were interested in (Empirical, Policy & Strategy, Technical Governance, Theory, Compute Infrastructure). An LLM screen filtered out applicants who clearly didn't meet a minimum bar, and produced advisory per-stream recommendations.
  2. Stage 2 — applicants who passed Stage 1 had their materials scored by LLM-graded rubrics. The empirical track used a composite score combining Research Skills, Technical Execution (split into MLE, SWE, Math sub-scores), and Soft Skills. The top ~600 by composite advanced to Stage 3.
  3. Stage 3 — applicants chose specific mentors / "streams" to apply to. Each stream reviewed its applicants and produced a ranked list. Top-ranked applicants got offers; lower-ranked got waitlisted. ~120 offers were made.

For the empirical track, the composite formula is 0.50·RS + 0.35·TE + 0.15·SS, where TE = 0.50·MLE + 0.30·SWE + 0.20·Math. A "relevance multiplier" (Direct=1.0 / Adjacent=0.85 / Distant=0.60) is applied to Research Skills based on how the applicant's experience matches the streams they applied to.

Outcome definitions (click to expand)

Outcome definitions used throughout these analyses:

What Pangram scored

For each applicant we compute max fraction_ai across all analyzed fields — i.e., the highest AI-detection score they hit on any single field. -1 (not analyzed) is treated as missing.

Caveat — interpretation

Pangram is a model. A 'high fraction_ai' score is not a confession; it's an estimate. False positives exist, especially for non-native-English writers who use ChatGPT or Grammarly for editing rather than generation. We don't treat fraction_ai as ground truth — just as an applicant-level signal.

Headline

Of 2,104 applicants with any Pangram-analyzed text, 772 (37%) had at least one field flagged as AI-generated (max fraction_ai ≥ 0.9).

Higher Pangram scores correlate with worse outcomes, but modestly: - Max fraction_ai → is_ranked (full pool, n=2,104): AUC = 0.574 [0.539, 0.607]. - Max fraction_ai → composite score (Spearman): ρ = -0.090. - In the Stage-3 empirical pool (n=765): AUC = 0.552 [0.510, 0.591]; Spearman ρ with composite = -0.028.

The Stage-3 effect is smaller — most of the Pangram-related signal gets absorbed earlier in the pipeline.

Per-field Pangram coverage

Field n analyzed Mean fraction_ai % with ≥0.9
Reasoning for top 3 (fraction_ai) 1,977 0.22 21.8%
Writing sample 1 (fraction_ai) 378 0.16 7.1%
Writing sample 2 (fraction_ai) 307 0.19 11.7%
Empirical option A (fraction_ai) 516 0.38 38.0%
Empirical option B (fraction_ai) 1,072 0.33 33.4%
Policy & Strategy option A (fraction_ai) 116 0.42 42.2%
Policy & Strategy option B (fraction_ai) 291 0.44 44.0%
Technical Governance option A (fraction_ai) 111 0.51 51.4%
Technical Governance option B (fraction_ai) 236 0.52 50.8%

ToC reasoning Pangram bucket → outcomes

The ToC reasoning text has the broadest coverage (~2,200 applicants). Bucketing into 4 fraction_ai bands:

Bucket n P(ranked) Mean composite
0 (human) 1,546 9.7% 1.69
low (0.1–0.5) 1 0.0% 0.00
high (≥0.9) 430 5.1% 1.47

Distribution of max fraction_ai

The distribution is heavily bimodal: many applicants score 0 (clean human writing on every field) or 1 (high AI detection somewhere).

Takeaways

  1. AI-text detection is a meaningful but not dramatic signal. AUC ~0.57 for predicting is_ranked from max fraction_ai in the full pool — comparable to ToC alignment (B3).
  2. Composite scores are lower for high-fraction_ai applicants (negative correlation). Whether reviewers were explicitly penalizing it, or AI-written content just happens to be lower-quality on the rubric anyway, this analysis can't distinguish.
  3. The effect attenuates in Stage 3 — high-fraction_ai applicants tend not to reach Stage 3 in the first place. Among those who do reach it, the residual fraction_ai signal is weaker.
  4. For 11.0: the question isn't whether to keep using Pangram (cheap, decent signal) but whether to make the policy more explicit. The current implicit penalty seems to work; a clearer up-front policy might both deter use and improve fairness.
🔧 Debug — how the data was interpreted (click to expand; safe to skip)

Sample. Full canonical pool, restricted per-analysis to applicants with at least one Pangram-analyzed field (n=2,104).

Outcome variable(s). is_ranked. Composite-mean reported descriptively.

Predictor fields. Per-field fraction_ai (0–1 with -1 = missing). Per-applicant: max and mean across all analyzed fields. AUC computed with negated score (higher fraction_ai should reduce P(ranked)).

Filters applied. Canonical dedup. Pangram fields with -1 (not analyzed) treated as missing.

Missing-data handling. Per-field listwise drop. n_fields_analyzed indicates per-applicant coverage.

Key assumptions / caveats.