How do IQ test developers choose which question formats to retire?

I was thinking about how some question types just disappear over time. You see older IQ tests with formats that never show up anymore, while newer tests lean heavily on a smaller set of familiar styles. It made me wonder how those decisions actually get made.

Do test makers drop formats because they stop predicting anything useful? Or because people get too used to them and learn how to game them? I also wonder if some question styles just do not work as well on computers compared to paper tests.

Test developers retire question formats based on several factors. First, psychometric data: if a format stops discriminating well between ability levels or shows declining validity in predicting real-world outcomes, it gets dropped. Second, test security: formats that become too familiar or get widely practiced online lose their effectiveness. Third, administration practicality: some formats that worked on paper don’t translate well to digital testing or take too long to administer. Fourth, cultural relevance: formats using outdated references or cultural assumptions get replaced. For example, some older verbal analogies relied on specific cultural knowledge that made them biased. Modern tests favor abstract reasoning formats like matrix tasks because they’re harder to game through practice and work better across cultures. The shift toward computerized testing has also pushed developers toward formats that can be automatically scored and adaptively administered.

I think the paper-to-computer transition reveals that some question types don’t just adapt poorly to digital formats; they become fundamentally different tasks. Manipulating physical blocks uses different cognitive resources than clicking and dragging with a mouse. Timed assembly can’t translate at all. So “retirement” might sometimes be a euphemism for “we can’t actually measure this anymore without changing what we’re measuring.” The uncomfortable question is whether we’re maintaining measurement continuity or just pretending that different tasks are equivalent.