Contextual Bias Cues in LLM Interface

Team Credit

Diane Hu (Team lead); Christina Yang; Winnie Tsai; Xiang Chen; Ziru Wei

Course

User-Centered Research and Evaluation,
Human-Computer Interaction Institute, CMU

Duration

2025 Spring

Tools
Figma
Cursor
Intro

Designers increasingly use LLMs to learn about hard-to-access user groups like those with ADHD. This embeds unnoticed biases into their research, shaping design decisions without designers realising.

We built an LLM interface with Bias Cues to prompt critical reflection on AI-generated insights.

Bias cues prototype v1 Bias cues prototype v2 Bias cues prototype v3 Bias cues prototype v4 Bias cues prototype v5
Research Methods Overview
Research methods overview diagram

*Main research methods

*https://www.smashingmagazine.com/2020/05/research-study-double-diamond-model/

Problem Scope

How can we enable novice design students to recognize and mitigate biases embedded in ChatGPT outputs while they use the tool to brainstorm early-stage design concepts?

Abstraction laddering diagram for problem scope

We used "Abstraction Laddering" to brainstorm needs and ideas which then reframed into problems.

Research Protocol

Contextual Inquiry

Research Method

Novice designer participants who have limited prior ADHD knowledge, are asked to conduct a design challenge by generating user scenario on ChatGPT to surface design opportunities.

While doing the task, participants are asked to think aloud. After the task participants will be asked a few questions on how they usually use chatGPT to do user research or human-centered design and their practices on filtering information biases.

This method allows us to analyze the relationship between bias recognition and design opportunities in early-stage UX research, uncovering how designers interpret and address biases when working with AI-generated user research artifacts.

Research questions
  1. Which types of GenAI content most readily earn novice designers' unconditional trust?
  2. Which interaction behaviors with GenAI are most likely to introduce or exacerbate bias?
Protocol Structure
Protocol structure diagram

Protocol Iteration

Pilot: Provided an AI-generated empathy map for review only.

Iteration 1: Switched to open-ended interaction to observe authentic design workflows.

Iteration 2 (final): Narrowed the task to generating user scenarios and extracting insights, revealing how participants transform raw AI output into design opportunities.

Experimental Controls

All sessions used the GPT-4o model with online-search and reasoning enabled; deep-research mode was disabled.

To avoid priming, neither the moderator nor the study materials mentioned potential bias before or during the session.

Data Analysis

To find the shared pattern in the data collected, we conducted 3 rounds of affinity diagraming, Organizing into thematic clusters and ordering from left to right along the user journey.

Affinity diagram session 1
Affinity diagram session 2
Affinity diagram full view

Two models emerged during data analysis: User journey map & Rose-Thorn-Bud

User journey map

User journey map captured emotional shifts (confidence, confusion, ignorant) and pinpointed moments when bias awareness rose or fell.

User journey map

We find out that:

  • Users start by leveraging variety in AI-generated content to mitigate inaccuracy, which provides a sense of control.
  • As their design tasks demand more precise and actionable insights, they adjust prompts to guide responses toward specificity. This shift from exploration to convergence exposes biases that were previously diluted, creating a fundamental tension in their strategy.
Rose, Thorn & Bud

The Rose-Thorn-Bud model is a formative feedback and thinking tool that helps us clearly identify existing strengths, problems, and potential opportunities.

We find out that:

Rose: Participants actively seek primary data, request citations, and develop ad-hoc bias-mitigation tactics—indicating a latent need for transparent provenance.

Thorn: Trust is low yet reliance is high—designers act as though the model is authoritative because it saves effort.

Bud: System-supported best practices could build on these informal tactics.

Insight

Novices over-specify prompts to gain control, but the added constraints can inject new bias.

Representative Evidence

P2 requested "extreme hyper-active" scenarios; P5 embedded her own reasoning, leading to flattery and echoing of her bias.

Implication

Offer interface controls to tune response diversity (e.g., "increase viewpoint diversity" or "reduce over-generalization").

Novelty bias: participants reward surprising output even when unverifiable.

Representative Evidence

"ChatGPT can fill in knowledge that I don't have." – P5

Implication

Encourage critical reflection on novelty; surface evidence for claims.

Many participants rely on AI-generated summaries rather than reading full text.

Representative Evidence

P4 asked ChatGPT to "simplify and combine" scenarios before deciding.

Implication

Position summaries as springboards for deeper inquiry rather than definitive answers; nudge users to inspect full text.

When the output neatly matches the prompt, scrutiny stops.

Representative Evidence

"I trust it when it answers my question as I requested." – P1

Implication

Insert reflective checkpoints or multi-perspective views to sustain critical engagement.

Assumption Artifact

Building on the contextual-inquiry insights, we ran a Crazy 8s sprint, distilled the strongest ideas into storyboards, and pressure-tested them with target users through speed-dating interviews.

This converged on a single high-stakes hypothesis:

Lightweight, in-flow bias prompts embedded in the ChatGPT workflow can help novice designers notice hidden stereotypes without disrupting their creative momentum.

To de-risk this hypothesis, we built an Assumption Artifact

A minimal-viable prototype of ChatGPT augmented with toggle-able bias prompts. The artifact allowed us to examine two intertwined uncertainties:

Effectiveness: Will students actually notice, read, and leverage the prompts to spot bias?

Non-intrusiveness: Can the prompts avoid feeling like pop-up noise that interrupts or burdens the design flow?

Testing Protocol
Testing protocol diagram
Discussion

Prompts can boost short-term recognition of biased content.

  • About 62% participants lingered on sentences flagged as biased, and 80% opened the linked readings.
  • Post-study survey scores on "which statements are biased?" rose relative to the pre-study baseline, suggesting short-term retention.

"Having opposing opinions presented to me (generated result and biase info) really helped me to identify design opportunities" — P8

The side-panel format feels tacked-on; its cognitive cost may outweigh perceived value.

  • Several students skimmed the snippets "for the answer" instead of reading deeply; others disliked having "unprompted information" appear beside the chat, describing it as distracting.
  • Interaction logs show hover/toggle time does not strongly correlate with learning gains, hinting that the current overlay feels tacked-on rather than integral to the task flow.

"I really don't like to have unprompted information shown up by the side of my generated content." — P10

Future Work

Understanding common biases associated with certain type of user groups is vital for equipping designers to interrogate and refine language-model outputs.

Yet participant feedback suggests that the current side-panel implementation remains superficial and fails to entice deeper engagement.

To remedy this, in future work bias cues can be embedded directly within the generated content, making the influence of specific stereotypes both visible and actionable.

Inline Integration

Replace the peripheral panel with inline highlights that flag potentially biased phrases inside the ChatGPT response and dock concise explanations next to each highlight. A controlled A/B study will compare this inline approach with the existing side-panel design to determine which format sustains learning and minimizes workflow disruption.

Contextual Rationale

For every flagged phrase, explicitly articulate how the associated stereotype shapes or distorts the surrounding text, and link to primary sources for further reading. This contextual scaffolding should transform one-off prompts into a coherent narrative that motivates novice designers to interrogate, rather than merely acknowledge, bias.