Honest Uncertainty: Revealing Confidence in AI Products

AI’s fluency can empower or mislead. To unlock its potential we must overcome our own tendency to misread uncertainty.

Dec 30, 2024

AI promises to make knowledge that was once locked behind cost and access barriers available to anyone, anywhere. Medical advice, financial planning, legal insights, unhindered by gatekeepers, expense, or geography. But alongside this promise comes a profound responsibility: these systems don’t deal in certainties. Their outputs are probabilistic, calculated guesses wrapped in convincing language. As designers and product makers, our obligation is to surface that probabilistic nature thoughtfully—especially when the stakes are high.

The challenge isn’t to eliminate uncertainty. That’s impossible. Instead, our job is to transform it into clarity. If we can’t provide users with certainty, we can provide them with next steps, actionable options, and collaborative tools that guide them through ambiguity. An AI system doesn’t need to predict the future perfectly; it needs to help users navigate it confidently—whether by surfacing confidence scores, offering transparent context, or simply framing recommendations as starting points rather than conclusions.

This is where we need a framework—a way to balance transparency with usability, trust with complexity. Humans have limitations: we crave certainty, oversimplify probabilities, and often misjudge risk. But design, when done with awareness and empathy, can work around those limitations. It can nudge us toward better decisions, build trust in imperfect systems, and turn uncertainty into opportunity.

The promise of AI is in its ability to offer fluent and deeply personalised outputs in a range of subjects, with the potential to broaden access to expert services at tremendous scale. The challenge is our ability to design systems that acknowledge their limits and ours, while unlocking new ways for people to learn, decide, and act. The question isn’t just whether we should surface AI’s uncertainty—it’s how we can help people understand it, and in doing so, empower them to move forward with confidence.

Every morning, I walk my kids to school. It’s not just a chore or a box to tick—it’s part of the rhythm of my day. A pocket of time together before the world pulls us all in different directions. Mornings like these are predictable in their unpredictability: breakfasts on the table, shoes that mysteriously go missing, and one small decision that sneaks up on me—whether or not to bring the umbrella.

To make that call, I glance at my watch. There’s a little widget on the face—a doughnut chart with a percentage at its centre. Today, it says 30%. That percentage should guide my decision. Thirty percent means, what? A light drizzle, maybe. Rain that’s unlikely to show up. But there’s a part of me that baulks—“30% isn’t zero,” I think. “What if today is the one in three when it rains?”

Over time, I’ve built my own rules for interpreting this widget: over 50%, I bring the umbrella. Under 40%, I don’t. Simple, right? But heuristics like this aren’t perfect. They’re the duct tape of decision-making. Sometimes, the rain comes anyway, and we arrive at school soggier than I’d like. Whose fault is it? Mine, for interpreting the number poorly? Or the forecast’s, for not being more precise? Either way, the result is the same: damp shoes, damp backpacks, damp spirits.

This is the challenge of probabilities: we bring our own heuristics, biases, and assumptions to them. We fill in the gaps with intuition, and sometimes, we get it wrong. But this challenge doesn’t just belong to weather forecasts—it’s deeply embedded in how we interact with AI systems.

Large language models (LLMs), for example, operate entirely on probabilities. At their core, they’re grids of numbers—mathematical structures calculating the next likely word, the next phrase. And yet their outputs feel anything but probabilistic. They’re fluid, convincing, and often beautifully coherent. This coherence is part of their magic, but it’s also part of their danger.

When an LLM suggests an answer, it’s not speaking from truth or certainty. It’s generating its best guess, drawn from patterns and probabilities. This probabilistic nature is invisible to most users because the output feels so human. It doesn’t hedge, hesitate, or explain itself—it just speaks. And that’s where we, as designers, step in.

Our responsibility as designers is to determine when—and how—to balance the impressiveness of these outputs with their underlying uncertainty. Not every scenario calls for transparency. In low-stakes contexts, like recommending a movie or drafting an email, it might not matter that an LLM’s response is probabilistic. But in high-stakes situations—healthcare, hiring, financial planning—people’s wellbeing is on the line. Users might act on these outputs, trust them, or even rely on them to make life-altering decisions. And in those moments, the system’s probabilistic nature can’t remain invisible.

But transparency isn’t easy. How do we surface a system’s uncertainty without overwhelming or misleading users? How do we help them navigate probabilities when we know they’re prone to misinterpretation, as I’ve learned from my soggy school runs?

As designers, our challenge isn’t simply to expose uncertainty—it’s to shape how people understand it. When do we surface a system’s confidence? How do we frame it to inform rather than overwhelm? And when is it better to avoid probabilities entirely and focus instead on actionable outcomes?

These are the questions I want to explore. Because at the end of the day, this isn’t just about designing to expose the intricacy of complex systems. It’s about designing for people—their biases, their shortcuts, and the ways they make sense of the world.

The Problem with Probabilities

The precision of numbers can feel reassuring. But once they get tossed into the human mind, they don’t stay tidy for long. Probabilities, in particular, grind our mental gears. They enter through the front door, all polished and scientific—“30% chance of rain”—and by the time they reach our decisions, they’ve been mangled into something simpler but less accurate: “It won’t rain.”

This is how we deal with uncertainty: we make it smaller, simpler, something easier to hold in our heads. And most of the time, that works well enough. But when the stakes are higher these mental shortcuts can trip us up.

Probabilities are honest in their uncertainty. The trouble is, we aren’t wired to think probabilistically.

The Tricks Our Minds Play

Behavioural psychologists Amos Tversky and Daniel Kahneman spent decades studying how we process probabilities, and what they found was equal parts fascinating and humbling. When faced with uncertainty, our brains rely on shortcuts—mental rules of thumb called heuristics—to make sense of it all. These shortcuts save us time and effort, but they also create blind spots, making probabilities feel simpler, more certain, or more dramatic than they really are.

Here are a few of the most common tricks our minds play:

We Round to Yes or No.
Our brains don’t like shades of grey. We prefer binary answers—rain or no rain, yes or no, right or wrong. A probability like “30% chance of rain” doesn’t feel like 30% at all; it feels like a coin flip, or maybe just a safe bet to leave the umbrella. We flatten uncertainty into absolutes, even when the world refuses to cooperate.
We Fixate on What’s Vivid.
The more vivid an outcome is, the more likely we are to think it will happen. This is called the availability heuristic, and it’s why we’re more afraid of flying than driving (plane crashes are rare, but dramatic). If you’ve ever been caught in an unexpected downpour, you might overestimate the likelihood of rain the next time you see a cloudy sky.
We Trust Ourselves Too Much.
When numbers feel familiar, we get overconfident. We assume we understand what they mean, even when we don’t. A “70% match” on a hiring tool might seem definitive—but what does that number actually represent? Skills? Experience? Potential? Without context, we fill in the gaps ourselves, often incorrectly.
We Ignore the Bigger Picture.
Probabilities never exist in isolation—they’re always part of a broader context. But we tend to focus on the number in front of us and forget what’s behind it. A medical app might say there’s an “85% chance of Condition X,” but if that condition is vanishingly rare in the general population, the likelihood you actually have it is much lower. We miss the base rate—the background probability that makes sense of the whole picture.

Why It Matters

The problem is, as humans, we don’t like uncertainty. It makes us uncomfortable. So we try to hammer it into something certain, something solid we can act on—even if it’s not entirely accurate.

That’s not to say we’re hopeless. We’ve been navigating uncertainty forever, from interpreting weather patterns to making bets on our careers and relationships. But we need help—tools and systems that don’t just throw probabilities at us, but help us make sense of them.

And that’s where design comes in.

Design isn’t about making uncertainty disappear. It’s about giving people what they need to move forward with confidence—even when the future is uncertain. It’s about recognising that probabilities aren’t the problem. The real problem is how we, as humans, interpret them.

When (and Why) to Surface Confidence

Not every AI output needs to be labeled with a measure of confidence. The decision to surface uncertainty should be tied to the stakes of the situation, the user’s goals, and the broader context in which the system operates.

Low-Stakes vs. High-Stakes Contexts

The first question to ask is: what’s at stake for the user?

If the stakes are low, users don’t need to see the uncertainty. In fact, showing confidence scores might overcomplicate things. Take something like autocomplete in an email client or a movie recommendation engine. Nobody cares if the system is 60% sure about the next word in your sentence or 85% sure you’ll like the latest action flick. You try it out, and if it works, great! If it doesn’t, no harm done.

But as the stakes climb, the need for transparency grows. The tipping point is when users might act on the system’s output in ways that could have lasting or serious consequences.

A medical diagnostic app suggesting a possible condition. A hiring tool scoring a candidate’s fit for a role. A financial platform forecasting your retirement savings. These aren’t just “nice-to-have” recommendations—they’re nudges that might shape major decisions about someone’s health, career, or future. And in those moments, users need to know not just what the system is suggesting, but why—and how confident it is.

Why Transparency Matters in High-Stakes Scenarios

1. To Build Trust
Trust doesn’t come from pretending to be perfect. It comes from admitting when you’re uncertain. Users don’t expect AI systems to get everything right, but they do expect honesty. Surfacing confidence—especially in high-stakes scenarios—signals that the system understands its own limits.

Take a medical app, for example. Imagine it suggests a diagnosis with “85% confidence.” That number, on its own, might feel impressive. But what’s more trustworthy: a system that gives a number without context or one that says, “I’m 85% confident based on these symptoms, but additional tests are needed to confirm this diagnosis”? The latter feels more human, more collaborative. It acknowledges its uncertainty and shares the responsibility of judgment with the user.

2. To Support Better Decisions
When users are aware of a system’s confidence—or lack thereof—they’re better equipped to make informed decisions. A hiring manager might see a candidate scored at “60% fit” and ask, “Why is this number so low? Is it because of missing qualifications, or is the system weighting something I don’t value as heavily?” That moment of reflection can make all the difference, shifting the user from passive recipient to active participant.

Without transparency, the opposite can happen: blind trust or outright dismissal. A candidate scored at “90% fit” might get an interview based on the system’s score alone, even if the job requires unique skills the model didn’t account for. And a 60% score might lead to unjustified rejection, even if the candidate is perfect for the role in ways the system couldn’t measure. Confidence labeling encourages scrutiny rather than deference.

3. To Meet Ethical Obligations
Finally, there’s the ethical dimension. In high-stakes contexts, designers have a responsibility to ensure that systems don’t mislead or harm users. This means being honest about the limits of probabilistic outputs—especially when users might otherwise assume the system is definitive.

Imagine a financial planning tool predicting an 85% chance you’ll meet your retirement goal. That might sound reassuring, but without surfacing the factors behind that prediction—market assumptions, saving habits, historical trends—the user might overestimate the reliability of the forecast. And if they fall short, who’s responsible: the user who trusted the system, or the designers who didn’t disclose its limitations?

The Designer’s Decision

Of course, there’s no universal rule for when to surface confidence—it depends on the context and the stakes. But as a starting point:

Low-Stakes Contexts: Keep it simple. In these cases, transparency can take a backseat to seamlessness. Users don’t need to know the system’s confidence because the stakes are low, and they can immediately judge the outcome for themselves. Think of autocorrect, movie recommendations, or product suggestions.
High-Stakes Contexts: Be honest about uncertainty. When decisions carry weight—affecting health, money, or livelihoods—it’s essential to surface confidence in a way that empowers users to make informed decisions. The goal isn’t to overwhelm them with probabilities but to offer enough transparency to build trust, encourage scrutiny, and avoid misleading assumptions.

The challenge is this: transparency done poorly can create as many problems as it solves. Show a user too much information, and they’re overwhelmed. Show them raw probabilities, and they might misinterpret or overtrust them. The key isn’t just surfacing uncertainty—it’s framing it in ways that are clear, actionable, and human.

Which leads us to the next question: how do we do that?

How to Surface Confidence Effectively

Transparency is a tricky business. It sounds simple—just be honest! But honesty without care can do more harm than good. When it comes to surfacing confidence in AI systems, the goal isn’t just to reveal what the system is sure about (or unsure about). It’s to make that uncertainty useful. A system that says, “I’m 70% confident” without explaining what that means is just passing the burden of interpretation to the user.

As designers, our job is to frame uncertainty in ways that help people—not hinder them. We’re not here to make probabilities disappear or overwhelm users with raw numbers. Instead, we need to provide just enough clarity and context so people can act confidently, even in the face of uncertainty.

Here’s how we might do that:

1. Translate Probabilities into Practical Guidance

People don’t want probabilities. They want to know what to do.

Imagine you’re looking at a weather app on the morning of your niece’s outdoor birthday party. It says, “40% chance of rain.” What does that mean for you? Should you bring an umbrella? Move the party indoors? Flip a coin?

The problem isn’t with the number; it’s with the gap between the number and the action. The app gives you a probability, but you have to figure out the rest.

Now imagine the app says, “There’s a chance of light rain this afternoon. If you’ll be outside, bring a light jacket or umbrella.” That’s better, right? The app isn’t just throwing numbers at you—it’s giving you guidance that’s practical and relevant to your situation.

Design Tip: Tie probabilities to concrete, actionable advice. Instead of saying, “85% confidence in this diagnosis,” a medical app could say, “Based on your symptoms, we recommend scheduling a follow-up test to confirm.”

2. Use Visual Cues for Clarity

Sometimes, it’s easier to feel a number than to read it. That’s why we use thermometers to show temperature or pie charts to show proportions. Visuals help us grasp concepts that are too abstract to explain with words.

Take a medical diagnostic tool. Instead of just saying, “85% confidence in Condition X,” it might show a heat map over a scan, shading areas of high and low confidence. Or a financial planning app might overlay a confidence band on a line chart to show the range of possible outcomes. These visuals don’t just display uncertainty—they make it tangible, something you can see and, more importantly, trust.

Design Tip: Use gradients, ranges, or icons to convey uncertainty visually. This softens the edges of raw numbers and makes probabilities easier to interpret at a glance.

3. Reframe Outputs Around Outcomes

People don’t think in terms of probabilities—they think in terms of priorities. What matters to them isn’t whether a route has an “88% likelihood of being the fastest” but whether it avoids traffic, gets them there on time, or makes the drive less stressful.

By anchoring on outcomes instead of confidence scores, we shift the focus from numbers to what’s important. A navigation app, for instance, might say, “This route avoids Main Street, which tends to get busy around this time of day,” instead of, “88% likelihood of fastest route.” The output is the same, but the framing feels more actionable.

Design Tip: Focus on outcomes that align with user priorities. Frame recommendations as “here’s what we know, and here’s what we recommend,” rather than a sterile percentage.

4. Offer Explanations Instead of Absolutes

Confidence scores can feel arbitrary without context. Why is the system “85% confident” in this answer? What’s behind the number?

Imagine a hiring tool that evaluates candidates. It rates one candidate as a “92% fit.” But what does that mean? Without context, the score feels definitive—but also opaque. Now imagine the system says, “This score is based on skills and experience matches but doesn’t account for cultural fit or team dynamics.” That explanation changes everything. It invites users to scrutinize the score and consider factors the system can’t evaluate.

Confidence doesn’t have to feel like an ultimatum. It can be the beginning of a conversation.

Design Tip: Pair confidence scores with explanations. “This recommendation is based on X, Y, and Z factors, but doesn’t consider A or B.” This builds trust by showing what the system knows—and what it doesn’t.

5. Replace Percentages with Tiers

Raw percentages can feel overly precise, even when they’re not. A “78.3% confidence score” feels exact, but the truth is, it’s not much different from “moderately confident.” When precision feels misleading, simplicity is a better choice.

Take a medical app, for instance. Instead of saying, “85% confidence,” it could label findings as “Low Concern,” “Moderate Concern,” or “Critical—Immediate Follow-Up Needed.” These tiers feel less intimidating and give users a clearer sense of what to prioritise.

Design Tip: Use qualitative labels like “High Concern” or “Low Confidence” instead of precise percentages that might mislead users.

6. Frame as Collaborative, Not Decisive

Nobody likes being told what to do. But they love being guided. Confidence scores work best when they position the system as a partner, not a boss.

A financial planning tool might say, “This savings strategy aligns with your goals, but let’s explore a few alternatives to confirm it’s right for you.” Or a medical app might say, “This diagnosis is likely, but we recommend consulting a doctor to confirm.” In both cases, the system isn’t dictating decisions—it’s inviting collaboration.

Design Tip: Avoid language that feels absolute (“This is the best option”) and lean into language that invites exploration (“Let’s refine this together”).

7. Highlight Next Steps, Not Just Confidence

Sometimes, the best way to frame uncertainty is to focus on what happens next. Instead of presenting a probability score, guide the user toward the next action.

Take a medical diagnostic tool. Instead of saying, “85% confidence in Condition X,” it could say, “We detected an abnormality in your scan. While this may not indicate malignancy, further testing is strongly recommended.” This approach shifts the focus from uncertainty to action, helping users navigate what might otherwise feel like a dead end.

Design Tip: Confidence scores should point users toward clear, actionable next steps, rather than leaving them stranded with a number they might not understand.

The Balancing Act

When we surface confidence, we’re doing more than showing numbers. We’re helping users navigate uncertainty—bridging the gap between what the system knows and what the user needs to decide.

But it’s a balancing act. Too much detail, and the user feels overwhelmed. Too little, and they might trust the system too much—or not at all. The goal is to give users just enough context, clarity, and guidance to move forward with confidence, even when the system itself isn’t certain.

When Not to Surface Confidence

Here’s something to consider: not every question needs an answer, and not every answer needs a qualifier. As much as we’ve talked about the value of revealing uncertainty, there are times when confidence scores do more harm than good.

Sometimes, showing probabilities solves a problem. Other times, it creates one.

The danger is this: confidence scores, percentages, and qualifiers can overwhelm, distract, or even mislead. They’re a layer of complexity that might not always be necessary. And as designers, part of our job is to know when to take things away—to simplify rather than over-explain.

When is it better to leave confidence in the background? Let’s look at a few scenarios.

1. When the Stakes Are Low

There’s a principle in design that says: don’t make people think harder than they need to. In low-stakes contexts, showing confidence scores adds cognitive load without much payoff.

Think about something as mundane as movie recommendations. If an app suggests a film, does it really matter if it’s “72% confident” you’ll like it? Probably not. You’ll watch the trailer, scan the description, and decide for yourself. Surfacing confidence here isn’t helpful—it’s noise.

Or take autocorrect in a text message. You don’t need to know the algorithm’s confidence in suggesting the word “interesting.” You just need it to work—or fail fast enough that you can fix it yourself.

In these cases, showing confidence slows things down. It creates friction in situations where users want speed, simplicity, and seamlessness.

The Takeaway: If the stakes are low and users can easily judge the output for themselves, don’t bother surfacing confidence. Let the system’s success—or failure—speak for itself.

2. When the User Can Validate the Output

There are moments when confidence doesn’t need to be shown because the user is already equipped to validate the output.

Think about grammar-checking tools. If a system suggests, “Did you mean affect instead of effect?” the user can immediately decide whether the correction makes sense. They don’t need to know the system’s confidence in the suggestion—they have the expertise to evaluate it themselves.

Or imagine a photo-editing app suggesting a filter. It doesn’t need to say, “80% confident this is the best look for your photo.” The user can see the result, tweak it, and decide if it’s right.

In these cases, confidence scores feel redundant. They’re not adding anything the user doesn’t already know—or can’t figure out themselves.

The Takeaway: If users can quickly and confidently judge the output for themselves, skip the confidence score.

3. When Confidence Feels Arbitrary

Numbers without context feel hollow. And when they feel hollow, they can backfire—leading users to trust too much or not at all.

This is especially true for systems like LLMs, which don’t inherently know the accuracy of their outputs. Imagine a chatbot offering legal advice. If it says, “I’m 78% confident this clause is enforceable,” the number feels precise—but what’s it actually based on? Without understanding how the system arrived at that figure, the user is left guessing: What’s the other 22%? Is this reliable or not?

Or think about a hiring tool that says a candidate is “85% likely to succeed in this role.” That number might feel authoritative, but it’s just a reflection of the data the system was trained on—which could be incomplete or biased. Without context, confidence scores can give the illusion of certainty where none exists.

The Takeaway: If the system’s confidence is hard to explain—or could be misleading—it’s often better to leave it out entirely.

4. When the System Can Act Alone

Confidence scores work best when users can act on them. But in some cases, the responsibility doesn’t belong to the user—it belongs to the system.

Take autonomous vehicles. A car might calculate that it’s “90% confident” it can safely navigate a busy intersection. But what does that mean for the passenger? Should they grab the wheel? Hit the brakes? Trust the car? Confidence scores in this context don’t empower the user; they just create anxiety.

Or consider content moderation tools. If a platform says, “This post is 80% likely to violate our guidelines,” what’s the takeaway? Does the moderator trust the system’s judgment or escalate it for review? The confidence score doesn’t simplify the decision—it complicates it.

In these scenarios, it’s often better for the system to take action (or escalate) behind the scenes, without involving the user in probabilities they’re not equipped to interpret.

The Takeaway: If users can’t act meaningfully on a confidence score, it’s better to let the system handle the uncertainty itself.

The Power of Restraint

Here’s the irony: while transparency can build trust, showing too much can erode it. Confidence scores are powerful tools, but they’re not always necessary—and they’re certainly not always helpful.

Sometimes, the best design decision is restraint. To ask yourself: does the user need this information to make a decision? Will it help, or will it hurt? And if the answer is “hurt,” it’s okay to leave it out.

After all, simplicity isn’t about hiding complexity—it’s about showing just enough to be useful. And in the end, that’s what good design is: giving people exactly what they need, and nothing they don’t.

Presenting Confidence Across AI Systems

Not all AI systems are created equal. Some can quantify their confidence with precision, while others operate with a kind of blissful ignorance. As designers, it’s our job to understand these differences and translate them into practical, user-facing choices.

Whether you’re designing for a large language model (LLM), a specialised diagnostic system, or a creative AI tool, the way you surface (or don’t surface) confidence needs to account for the system’s capabilities, the context of use, and what the user needs in the moment. Transparency isn’t one-size-fits-all—it’s a puzzle we solve by adapting to the unique nature of the system and the scenario at hand.

Here’s a breakdown of the practicalities across different types of AI systems:

1. Large Language Models (LLMs): Confidence Without Awareness

LLMs are, at their core, probabilistic engines. They generate outputs by predicting the next most likely word, phrase, or sentence. But here’s the catch: they have no inherent “awareness” of whether their outputs are correct. An LLM is just as confident in a hallucinated fact as it is in a well-documented one.

This makes surfacing confidence for LLM outputs particularly tricky. Any confidence score would have to be retroactively assigned—an estimation of certainty based on factors like how frequently the output aligns with training data or how well it matches patterns. But retroactive confidence scores are imperfect at best and misleading at worst. They risk creating a false sense of authority for a system that’s fundamentally guessing in a very convincing way.

Practical Design Approach:

Avoid overly precise confidence scores (e.g., “87% confidence”). Instead, focus on framing outputs as suggestions or starting points, not definitive answers.
Use language that invites collaboration: “Here’s one way to think about this,” or “This is a starting point—does it look right to you?”
Highlight limitations explicitly: “This system generates responses based on patterns in data and may not always be accurate.”

When working with LLMs, the goal is to temper trust with healthy skepticism. It’s not about undermining the system but about creating a balanced, honest relationship between user and tool.

2. Specialised Systems: Confidence With Specificity

Specialised models—like diagnostic tools for medical imaging, fraud detection algorithms, or credit risk assessors—are often designed with confidence baked in. These systems are trained to analyse specific datasets and calculate the likelihood of particular outcomes. A diagnostic AI, for instance, might identify a suspicious mass on a CT scan and assign a confidence score to its prediction: “85% likelihood of malignancy.”

Here, confidence scores can be meaningful because they reflect a measurable degree of certainty based on the system’s training and performance. But numbers alone aren’t enough. A raw “85%” doesn’t help a doctor decide what to do next, and it might overwhelm a patient who’s seeing that figure for the first time.

Practical Design Approach:

Pair confidence scores with context: Explain what the number means and what factors contributed to it. For example, “This finding is based on similar patterns detected in 1,000 prior cases.”
Use visual aids to make confidence more intuitive: Highlight regions of interest on an image or show confidence ranges visually (e.g., shaded bands on a chart).
Provide next steps: Confidence scores are only useful if they guide action. For instance, “85% likelihood of malignancy—recommend biopsy to confirm.”

With specialised systems, the key is to channel precision into practicality. Confidence should guide users toward better decisions, not paralyse or mislead them.

3. Hybrid Systems: Collaborative AI Models

Some systems combine the strengths of different AI models—think of an LLM integrated with a diagnostic tool. For instance, a medical assistant app might use an LLM to interpret patient symptoms and a specialised diagnostic model to analyze lab results.

In these hybrid systems, the challenge isn’t just surfacing confidence—it’s navigating the interplay between different types of confidence. What happens when the LLM confidently suggests Diagnosis A, but the diagnostic model rates it as a low likelihood? How do we present these conflicting signals to the user without causing confusion?

Practical Design Approach:

Show confidence measures for each system separately, so users can see where the insights are coming from. For example:
- “Symptom analysis (LLM): 70% likelihood of Diagnosis A.”
- “CT scan analysis: 30% likelihood of Diagnosis A.”
Clarify how the systems interact: “The CT scan results suggest Diagnosis A is less likely. Consider further testing to confirm.”
Avoid aggregating confidence scores into a single figure—it can mask important nuances. Instead, focus on showing the reasoning behind each system’s output.

Hybrid systems demand clarity and coordination. It’s not enough to present probabilities—you need to show how the pieces fit together.

4. When Confidence Can’t Be Quantified

In some contexts, confidence simply isn’t meaningful. This is especially true for creative or exploratory tools, where the goal isn’t to deliver a “correct” answer but to generate ideas or spark possibilities.

Take a tool that helps users brainstorm product names. Should it assign a confidence score to each suggestion? Probably not—creativity doesn’t work that way. Or consider a music recommendation system. Saying “We’re 90% confident you’ll like this song” doesn’t matter as much as whether the user clicks “play.”

Practical Design Approach:

Focus on transparency about limitations: Be upfront about the system’s role in the process. For example, “This tool generates ideas based on patterns in existing product names—it’s a starting point, not a final answer.”
Frame outputs as collaborative or iterative: “Here are a few ideas—let us know if we’re on the right track.”
Emphasise the process over the result: In creative contexts, the value often lies in exploration, not certainty.

When confidence can’t be quantified, it’s better to lean into the system’s exploratory nature rather than force precision where it doesn’t belong.

The Right Fit for the Right System

Ultimately, presenting confidence isn’t about the system itself—it’s about how the system meets the user in their moment of need. The way we frame confidence for a diagnostic AI will look very different from how we handle it in a brainstorming tool or a hybrid model. And that’s okay.

What matters is that we, as designers, take the time to understand the systems we’re working with. That we map out the scenarios they’ll encounter and the ways users will rely on them. And that we shape transparency thoughtfully—not as a blanket rule, but as a set of tailored practices.

Because transparency isn’t about showing everything. It’s about showing the right things, in the right way, at the right time.

Wrapping Up: Designing for Honest Uncertainty

Design is an exercise in balance. We take messy, complex systems and shape them into something people can use, trust, and understand. But with AI, that balance gets trickier. These systems are powerful, probabilistic, and sometimes opaque—brilliant at generating fluent outputs, but often blind to their own limitations.

When we design for AI, we’re not just designing systems. We’re designing relationships: between the user and the tool, between the tool and the truth, between uncertainty and action. And as with any relationship, trust is key.

That trust doesn’t come from showing everything or explaining every detail. It comes from being honest about what the system can do—and what it can’t. It comes from knowing when to surface confidence and when to let it stay in the background. It comes from crafting experiences that respect the user’s context, the stakes of their decisions, and the ways their human minds interpret complexity.

Here’s the truth about designing for AI: there’s no single right answer. Sometimes the best thing we can do is expose uncertainty, frame probabilities, and let users weigh their options. Other times, the kindest choice is restraint—simplifying, hiding unnecessary complexity, and letting the system quietly do its job.

What’s clear is this: AI products are only as good as the trust they earn. That trust doesn’t come from pretending to be perfect. It comes from designing systems—and experiences—that are as honest in their uncertainty as they are in their promise.

When we get it right, it feels effortless. A suggestion that’s collaborative rather than decisive. A confidence score that leads to clarity, not confusion. A tool that helps people make better decisions, not just faster ones.

Because at the end of the day, the job isn’t to eliminate uncertainty. It’s to give people what they need to move forward, even when the future isn’t clear. That’s not just a design challenge—it’s a human one. And it’s one we’re uniquely equipped to solve.

Signal Path

AI is reshaping the way we design—our products, our tools, our jobs. Signal Path is a weekly exploration of the challenges and opportunities in making AI products intuitive, trustworthy, and a little more human. Written by Andrew Sims, a design leader working in Fintech, it’s for designers, product thinkers, and technologists grappling with AI’s impact on their work and the products they make. I’m not claiming to have all the answers—this is a way to think through ideas, articulate challenges, and learn as I go. Thank you for joining me as I navigate this path.

Honest Uncertainty: Revealing Confidence in AI Products

AI’s fluency can empower or mislead. To unlock its potential we must overcome our own tendency to misread uncertainty.

The Problem with Probabilities

The Tricks Our Minds Play

Why It Matters

When (and Why) to Surface Confidence

Low-Stakes vs. High-Stakes Contexts

Why Transparency Matters in High-Stakes Scenarios

The Designer’s Decision

How to Surface Confidence Effectively

1. Translate Probabilities into Practical Guidance

2. Use Visual Cues for Clarity

3. Reframe Outputs Around Outcomes

4. Offer Explanations Instead of Absolutes

5. Replace Percentages with Tiers

6. Frame as Collaborative, Not Decisive

7. Highlight Next Steps, Not Just Confidence

The Balancing Act

When Not to Surface Confidence

1. When the Stakes Are Low

2. When the User Can Validate the Output

3. When Confidence Feels Arbitrary

4. When the System Can Act Alone

The Power of Restraint

Presenting Confidence Across AI Systems

1. Large Language Models (LLMs): Confidence Without Awareness

2. Specialised Systems: Confidence With Specificity

3. Hybrid Systems: Collaborative AI Models

4. When Confidence Can’t Be Quantified

The Right Fit for the Right System

Wrapping Up: Designing for Honest Uncertainty

Signal Path

Discussion about this post