The first version of our smart-reply model would happily draft a response to anything you fed it, including obvious spam. The drafts were technically correct. They were also funny in a way that wasn't great for our brand.
One of our beta customers shared a screenshot. A spam submission about offshore SEO services had received a polite, professional, AI-generated reply from their company: 'Thanks for reaching out about your SEO offering. We're not in the market right now, but please keep us in mind for the future.' The customer was amused. We were not.
What we changed
The fix was four lines of code. If the spam confidence is above 40, the reply model returns null instead of generating. The UI shows 'no reply drafted — spam suspected' instead. Customers can override and force a draft, but no one ever does, because in those cases the spam classifier is almost always right.
An AI that knows when not to speak is more valuable than one that always has something to say.
The deeper lesson
It's easy to optimise an AI feature for coverage — what fraction of inputs produce a useful output. Coverage is the wrong target. The right target is precision-conditional-on-output: when the model does produce something, how often is it good? Refusal is a feature, not a failure mode.
We've since added similar refusal conditions to two other classifiers. The tag generator refuses to add tags when its confidence is below 0.6. The category router refuses to route when no category scores above 0.5 — it falls back to 'unrouted' and lets the human decide. Customers thank us for the refusals more than they thank us for the predictions.
If you're building a product on top of LLMs, sit down and write your refusal policy. Then enforce it at the model boundary, not in the UI. The users you're trying to protect won't see the model's output anyway, so they can't complain about a model that's quiet.