The authors found “converging evidence consistent with substantial publication bias” (p. 577). After adjusting for publication bias, the effect size dropped to g = .29 to .32.
Also, statistical power was very low for the adjusted effect size. Fewer than 10% of studies had enough power to detect a .30 effect size. Less than half had sufficient power to detect a .60 effect size. This is unsurprising: the median sample size was 53.
Also, there was circumstantial evidence of widespread questionable research practices (QRPs). Over 40% of studies that used a divergent thinking test as an outcome variable didn’t report all of the scores that the tests produce. This means selective reporting is likely at work. Other QRPs may be present, too.
Finally, modern research practices are almost completely absent from creativity training studies. Only 7 replications were found (and only 2 of those were from 2010 or later), and only 1 pre-registered study was found.
Based on this meta-analysis, it is safe to say that there are no high-quality studies of creativity training. Maybe we can train people to be more creative, but given the quality of the evidence, no one really knows. This is why the authors stated, “. . . practitioners and researchers should be careful when interpreting current findings in the field” (p. 577).
The drop from g = .53 to g = .29-.32 after correcting for publication bias is massive and shows how unreliable the published literature is. What’s worse is that even the adjusted effect might be inflated given the widespread methodological problems, selective reporting, and lack of replications. The fact that zero studies met all four quality criteria is damning. This isn’t just a weak effect, it’s an entire field built on shaky foundations. The creativity training industry is selling programs based on research that wouldn’t pass basic scientific standards. Until we get pre-registered, adequately powered, high-quality studies, we simply don’t know if creativity training works.
The selective reporting finding is really concerning. If 40%+ of studies didn’t report all the scores their measures produced, they were clearly cherry-picking favorable results. Combined with tiny sample sizes, no replications, and pervasive publication bias, the true effect could be close to zero. The power analysis shows most studies couldn’t detect a real effect even if it existed. This is similar to the working memory training literature, lots of hype, weak methodology, publication bias masking null results. The lesson is that meta-analyses of low-quality research can’t save you, garbage in, garbage out. We need a complete restart of creativity training research with modern standards.