Essay · 02·2026 · 2 min

Weapon or Tool

Every AI interface decides whether its output reads as a suggestion or a command.

There’s a moment in Arrival where different nations hear the same alien word and half translate it as “weapon,” the other half as “tool.” Same signal. Opposite meaning. The world almost goes to war over it.

Ian tries to make sense of the Heptapod logogram — meaning is clear to the alien, ambiguous to everyone else

I design AI interfaces. We make that choice every day. When an AI generates a response, the interface decides whether it reads as a suggestion or a command. We usually default to command without realizing it.

The problem

Every AI chat interface ships the same way. Clean, confident prose. No hedging. The output looks identical whether the model is 95% confident or making it up.

Heptapod ink — a logogram forming on the glass, meaning compressed into a single gesture

First time I used ChatGPT for real work, I pasted a response straight into a doc without checking. Years of cross-referencing sources, gone. The formatting did it. Clean paragraphs, confident tone. My brain read it as finished, not a guess.

Google gives you ten links and says “you decide.” AI chat gives you one answer and says “here you go.” We centralized decision-making into a single voice and made it look authoritative.

What we did at Factweavers

We built an analytics tool that generated insights. Early version: “Q3 revenue dropped 12%, driven primarily by SMB churn.” Users copy-pasted it straight into board decks. But some insights were correlations the model presented as causation.

We changed the output layer. Added confidence scores: “Q3 revenue dropped 12% (confidence: 73%).” Users investigated the low-confidence ones.

Louise at the whiteboard — you can't jump to "what is your purpose on Earth?" without building shared language first

Added explainability: see the reasoning, data sources, adjust parameters. Surfaced alternatives: “Revenue dropped 12%, likely SMB churn. But the timing also lines up with the July pricing change, and the model can’t tell which caused it.”

Usage went up. They already knew they couldn’t fully trust it. We gave them a way to question it.

The details

“This page might convert better with a simpler layout” is a suggestion. “Simplify this layout” is a command. Typography does it. Font size does it. One answer vs. three options does it.

On Factweavers, an empty text box said “Ask anything.” That implied the system could answer anything. We replaced it with suggested queries scoped to what the tool was good at.

Why it ships this way

Louise reconstructing the logogram — building shared understanding one symbol at a time

“Make it feel magical” is a real note I’ve gotten. Showing uncertainty feels like weakness. Showing alternatives feels indecisive.

We tried confidence bars on everything in an early prototype. Result: nothing felt reliable. 90% confidence looked the same as 60% because visual noise drowned everything out.

The answer is granularity. High confidence? Present it clean. Medium confidence? Show your work. Low confidence? Flag it, show alternatives. Most products ship one treatment for all states.

The cost

The honest version is harder to build. The Factweavers redesign took ten weeks longer. Confidence scores need calibration. Explainability needs its own UX.

At one company we shipped an AI feature without the UX work. Took four months to unship what took two weeks to build. Users treated recommendations as directives. When they were wrong (about 15% of the time), support tickets said “the tool lied to me.”

An AI that earns trust slowly is more useful than one that borrows it and loses it.

Published 02·2026 · 524 words · 2 min

All notes