Are IQ tests reliable? Why do the scores change so much?

Has anyone here looked into how reliable IQ tests actually are? I’ve been reading about test–retest reliability and how good tests, like the WAIS or Stanford-Binet, tend to show pretty stable scores over time. But I’ve also seen people say their results can fluctuate by 10 points or more depending on mood, sleep, or even the testing environment. How much variation is considered normal? And does that mean most online IQ tests are basically useless for measuring anything consistent?

1 Like

Every test has a standard error of measurement, which is the same as a margin of error for its score. For full-length IQ tests, this is about +/-3 to 5 IQ points. That means that if nothing changes, we would expect a second retesting to be within 3 to 5 points.

Yes, hunger, mood, sleep, motivation and other things can influence scores, but their impact is pretty minor. IQ test administrators encourage people to be well rested and perform at their best, and most people do this. The administrator can also encourage breaks, and some even bring snacks if they think hunger might be an issue for an examinee.

6 Likes

@russwarne For a highly gifted person, a slight drop in motivation might only cost 3 points, but for a person near the borderline range, couldn’t a sudden 5-point loss due to anxiety push them into a clinically different category? How robust is the SEM across the entire IQ distribution, especially at the extremes?

There were rare occasions when I observed larger fluctuations of scores, but it usually means something affected the person’s performance during testing. It is why we always interpret these scores in context rather than treating them as absolute.

This is why we never base a diagnosis or clinical decision on a single test score alone, we use a comprehensive psychological battery. Yes, someone near a diagnostic threshold could shift categories with a 5-point fluctuation, which is why we look at multiple sources of data (adaptive functioning scales, academic/work history, previous testing if available, behavioral observations during the assessment, and sometimes retest if we suspect the score doesn’t reflect their true abilities). That’s why ethical practice means never reducing someone’s entire cognitive profile to a single number from one testing session. We need the full picture to make sound clinical judgments.

Professional tests have .90-.95 reliability but scores can vary 5-10 points from measurement error or test conditions (fatigue, anxiety). Changes beyond 10 points are uncommon. Online tests lack standardization and proper norming so they’re unreliable - scores are essentially meaningless. For a valid score, you need professional administration under conditions.