How do test designers handle items that produce mixed interpretations?

I have come across a few questions where different people gave completely different answers, and both seemed reasonable depending on how you looked at it. That made me wonder how test designers deal with items that do not have a single clear interpretation.

Do they remove those questions entirely, or do they study how people respond and adjust them? I imagine mixed interpretations could mess with the accuracy of the test, but it also seems like some level of disagreement might be expected.