Large language models predict cognition and education close to or better than genomics or expert assessment

A new paper from Tobias Wolfram examines the ability to use different data from childhood to predict adult outcomes. Three predictors (a single essay, 22 teacher assessments, and DNA-based polygenic scores) were used to predict IQ, childhood/adolescent academic achievement, adult educational attainment, and adult non-cognitive traits.

LLMs were used to create predictor variables. Surprisingly, data from a single essay at age 11 (avg length = ~250 words) could predict up to 37-59% of variance in academic achievement (3rd image). When predicting IQ at age 11, the teacher’s evaluation was the best single predictor (R2 = .62), but combining it with polygenic scores and the essay data, the explained variance rose to .70 (4th image). According to Wolfram, “The prediction of our best model approaches the test-retest reliability of benchmark intelligence tests” (p. 5).

This is an important step forward in using non-test data to predict IQ. While current LLMs do not surpass data based on a knowledgeable rate (e.g., a teacher), this paper points the way to using AI to understand people’s psychological traits better.

Original post: https://x.com/RiotIQ/status/2037172469857956332?s=20
Full article: Large language models predict cognition and education close to or better than genomics or expert assessment | Communications Psychology

The results are striking. A single 250-word essay predicted 37-59% of variance in academic achievement. When combining essay data, teacher evaluations, and polygenic scores, they predicted IQ with R² of 0.70, approaching the test-retest reliability of actual IQ tests.

This is a major advance in psychological assessment. Traditional IQ testing requires hours of administration by trained professionals. If LLMs can extract cognitive signals from natural writing samples, we could potentially screen for learning difficulties, giftedness, or educational needs at scale using data schools already collect. Teacher assessments remain the strongest single predictor, but AI-analyzed writing performs surprisingly well. The combination of text analysis, genetics, and expert judgment creates the most powerful prediction model. This opens possibilities for early intervention and personalized education based on existing writing rather than expensive standardized testing.