How are IQ tests created?

How do psychologists actually create IQ tests? It seems incredibly complex to design questions that accurately measure intelligence across different people and backgrounds.

From what I understand, there’s a whole scientific process called “psychometrics” involved. I know tests like the WAIS have been revised many times over the decades. Anyone familiar with the actual development process or the science behind it?

Welcome! The process is pretty involved. Basically, test developers create a large pool of items, pilot test them on representative samples, then use statistical analysis to see which items actually discriminate between ability levels. They look at things like item difficulty, how well each item correlates with total scores, and whether items show bias across different groups. The normalization process is huge too, they test thousands of people to establish what scores mean at different percentiles.

It’s definitely complex! Item Response Theory (IRT) is a big part of modern test development. They don’t just pick random questions, each item goes through rigorous validation. They also do factor analysis to make sure the test is actually measuring what it claims to measure (like working memory vs processing speed). And yeah, tests get revised regularly as norms change over time. The Flynn Effect is one reason they need constant updating.

IQ tests don’t discover that intelligence is normally distributed because they’re engineered to produce that result. During development, psychometricians deliberately select questions to generate a bell curve with a mean of 100 and standard deviation of 15. If their item pool produces scores that are too clustered, they add harder questions. This isn’t bad science - it’s intentional design for a measurement tool. You want spread for comparison purposes. But it does mean we’ve built an instrument that cannot possibly discover that intelligence follows a different distribution.

IQ tests are normed against representative samples, with 100 set as the average. But the representative actually means the average of people who volunteer to spend hours taking cognitive tests for research purposes. That excludes people with test anxiety who don’t volunteer, those working multiple jobs who don’t have time, communities with historical reasons to distrust psychological research (they are mostly underrepresented), and people who find academic-style testing alienating or insulting (who decide to opt out).

So the average is actually the average of a self-selected group that skews toward people comfortable with this exact type of evaluation. Then everyone else i (including people who avoided being tested) gets scored relative to that group.