Do artificial intelligence systems help create new test items?

I recently started wondering whether AI is being used behind the scenes to help build new IQ test questions. It seems like pattern generation, logic structures, and item variation would be something machines could do well, at least as a starting point.

Are test developers actually using AI to generate draft items or explore new formats, or is everything still written by human item writers? If AI is involved, does it help improve quality, or does it create problems with predictability or repetition? I am also curious whether AI generated items still need heavy human review to make sure they measure the right thing.

AI is starting to be used experimentally for generating test items, especially for pattern-based questions like matrix reasoning. Researchers can use machine learning to create variations with specific difficulty parameters or generate novel visual patterns. However, human psychometricians still heavily review and validate everything because AI can create items that look good but don’t actually measure what they’re supposed to. The main advantage is efficiency: AI can generate hundreds of candidate items quickly, then humans select the best ones through pilot testing. The risk is that AI-generated items might have subtle patterns or biases that humans wouldn’t introduce, which could make them easier to game or less valid. For now, it’s mostly a tool to speed up the brainstorming phase, not replace human expertise in test development.