Project Aura

Section 02 · The Problem

Four hours a case.
Every case. Every quarter.

The CIS analysts were doing something that felt like it should have been solved already. They'd open a PDF, find the Scope 2 emissions figure on page 34, type it into the legacy system, then move to the next document. Over and over, for every client, every quarter. Four hours per case. Constant transcription errors. And with the EU's CSRD regulation coming into force, the error rate was about to become a legal problem, not just an operational one., and a growing backlog as EU regulations started demanding more granular ESG data in every premium calculation.

4.2h

Time a risk analyst spent on a single client premium assessment. Most of that wasn't analysis — it was hunting for numbers across PDFs and copying them into the legacy system.

23%

Error rate in ESG data entry before Aura. When those errors feed into the premium calculation model, clients end up with the wrong price — and we carry the liability.

EU CSRD

Incoming regulatory mandate requiring verifiable ESG audit trails — a compliance clock ticking against every manual workflow.

0

Existing AI tooling within the CIS division. Every extract was manual, every citation was verbal, every audit was forensically unverifiable.

Stakeholder Alignment Framework

🏢

CIS Leadership · Speed & Scalability

"We need to onboard EU clients 3× faster without proportionally scaling analyst headcount."

→ Delivered: Aura reduces per-audit time from 4.2h to 1.4h. 3× throughput achieved without headcount increase.

⚖️

Legal & Compliance · Zero-Hallucination Mandate

"Any AI output that cannot be traced to a cited, internal source document is a regulatory liability. Full stop."

→ Delivered: Closed-loop RAG architecture cites only vetted internal PDFs. Every AI output is source-pinned and human-approved before commit.

⚙️

Engineering · No Backend Rewrite

"The legacy system is 15 years old. We cannot risk a full rewrite. Any solution must integrate via API overlay only."

→ Delivered: Aura is a read/write API layer. The legacy database is never directly modified. Engineers approved the pattern in Sprint 2.

User Research Synthesis · CIS Division Sprint 1–2 · Contextual Inquiry

Discover

Pain Point Analysts spend 40 min locating correct emission factor tables across 6 different PDF versions.

Observation 3 of 5 analysts maintain personal Excel sheets to track data discrepancies — a shadow system.

Quote "I'm an insurance analyst, not a data parser. I should be doing risk assessment."

Data Point Average 2.3 re-opens per PDF document per audit session — no persistent extract state.

Define

HMW How might we surface the most critical ESG data points without requiring manual document navigation?

Constraint Legal requires every AI output to be traceable to a specific page & paragraph of the source document.

Insight Trust in AI is conditional on legibility. Analysts will accept AI suggestions only if they understand the reasoning chain.

Validated

Validated Human-approval gate before any AI data is written to the legacy system. Non-negotiable UX requirement.

Validated Inline source citation (doc name + page number) immediately adjacent to every AI-extracted value.

Validated Confidence score display increases analyst click-through to source verification by 62%.

Section 03 · The Triad of Adoption

Three people.
Three definitions of “this works.”

The thing that made this hard wasn't the AI. It was the people. The analyst needed to trust the output before acting on it. The compliance officer needed a legally defensible audit trail. The engineer needed the system to not touch a 15-year-old database that nobody fully understood anymore. Three people, three completely different definitions of "this works." If I got one wrong, the whole thing would get rejected.

👩‍💼

Sarah M.

Senior Risk Analyst · CIS

"If I can't verify where that number came from, I cannot sign the audit. Period. I don't care how fast the AI is."

Experience

11 Years · Insurance

Audits / Week

12–18 Cases

Tech Fluency

Medium–High

AI Sentiment

Cautiously Skeptical

Precise data extraction with inline source citations (PDF name + page)

Confidence scores displayed adjacently to every AI-generated value

One-click source verification — must open the exact PDF page highlighted

Current Cognitive LoadCritical

👨‍💼

Thomas K.

CIS Lead · Workflow Oversight

"I need to see team throughput, where bottlenecks are forming, and whether the AI is actually accelerating output — not just shifting the work."

Experience

17 Years · Risk Mgmt

Reports To

VP, Global Risk

KPI Focus

Throughput + SLA

AI Sentiment

Pragmatically Optimistic

Real-time audit pipeline dashboard showing team-level processing velocity

AI override rate tracking — flags when analysts are systematically rejecting AI outputs

Weekly comparative report: AI-assisted vs. manual audit quality delta

Current Cognitive LoadHigh

🧑‍⚖️

Julian V.

Compliance Officer · EU Regulatory

"Under GDPR and CSRD, I need a complete, immutable log of every AI decision, every human override, and every data source referenced. This must be exportable for regulator review."

Experience

9 Years · EU Compliance

Jurisdiction

GDPR · CSRD · SFDR

Review Cadence

Quarterly Audits

AI Sentiment

Deeply Suspicious

Immutable GDPR-compliant audit log: AI query → source cited → human decision → timestamp

Zero external data egress — all RAG queries resolved against internal document stores only

One-click PDF export of full decision trail, formatted for EU regulator submission

Current Compliance RiskSevere

Section 04 · How I Worked Through It

I had to change
the process for this one.

I've run Double Diamond processes on maybe twenty products. Aura was the first time I had to adapt it fundamentally. The problem with applying standard UX process to AI is that you're designing for an output you can't fully predict. The AI might be right 94% of the time — but you're designing that 6% as hard as you're designing the rest. I started treating the AI's behaviour as a design material, like a constraint, rather than a feature.

01 · Discover

◆

Data Readiness & Contextual Inquiry

Embedded with analysts for 3 weeks. Shadow sessions mapping the exact document navigation patterns, error points, and trust signals in existing workflows.

Shadow Sessions Data Audit Stakeholder Interviews AI Readiness Score

02 · Define

◆

Intent Mapping & Human-in-the-Loop Strategy

Mapped analyst intent patterns across 240 audit sessions. Defined the exact intervention points where AI autonomy must yield to human judgment.

Intent Maps HITL Framework Failure Mode Analysis Trust Model

03 · Design

◆

Generative UI & XAI Architecture

Built the Explainability Seam system — a UI pattern ensuring every AI output exposes its reasoning chain, confidence, and source attribution as first-class interface elements.

Explainability Seams Generative UI ProtoPie AI Flows XAI Patterns

04 · Test

◆

Wizard of Oz Prototyping & Trust Calibration

Used WoZ methodology to simulate AI behaviour before model integration. Intentionally injected a 5% calculation error to test for Automation Bias. 100% detection rate.

WoZ Prototype Automation Bias Test AI-SUS Scoring Trust Calibration

⚡

The AI Adaptation: Why the Standard Diamond Fails

Classical Double Diamond assumes deterministic outputs at each stage. AI systems are fundamentally probabilistic — the "correct" design for a 94% confidence AI output is categorically different from a 67% confidence output. Aura's process introduces an AI Behaviour Calibration Loop between Define and Design: a phase where we audit model outputs against real document sets, map failure modes before UI design begins, and define the exact confidence thresholds that trigger different UI states. This prevents the critical mistake of designing for idealised AI performance.

Section 05 · System Architecture

Why the data
never leaves.

One decision I'm most proud of: the data never leaves the private cloud. I pushed for this early, even before the engineers had a strong opinion on it. Under GDPR, if client emissions data touches an external API, you have an egress problem. By making the architecture closed-loop from the start, we didn't just solve a compliance requirement — we made hallucination structurally impossible. The AI can only reason about documents that are already inside the system.

AURA · SYSTEM ARCHITECTURE DIAGRAM · CONFIDENTIAL

LAYER 1 💬

Analyst Intent

Natural Language
Query Input
& Document Upload

▶

NLP Intent

LAYER 2 🧠

AI Gateway

NLP Router
Intent Classification
Agent Orchestration

▶

Retrieval Query

LAYER 3 🔒

RAG Engine

Semantic Search
GDPR-Compliant
Internal Sources Only

▶

Structured Output

LAYER 4 📋

Data Card UI

Structured Output
Human Approval Gate
Legacy System Push

📁

Internal Legacy
Insurance Database

🔒 GDPR Compliant · No External Egress

All RAG queries resolved against internal data stores exclusively. Zero web access. Zero third-party API calls.

📄

Uploaded Client
PDF Documents

01

Why RAG Prevents Hallucination

Standard LLMs generate answers from parametric memory — they confabulate plausibly when uncertain. Our RAG architecture forces the model to retrieve before generating: it can only output values that exist verbatim in the indexed source documents. If a data point isn't in the internal knowledge base, Aura returns a structured "No verified source found" card — never a fabricated figure.

02

The Intent Router: Multi-Agent Orchestration

The AI Gateway classifies analyst queries into three agent tracks: Extraction (pull structured values from PDFs), Comparison (benchmark against sector norms in the legacy DB), and Compliance Check (verify against CSRD/SFDR thresholds). Each track has independent confidence thresholds and UI states, preventing a single model failure from cascading across the interface.

03

The GDPR Compliance Guarantee

Julian's core requirement was zero data egress. We achieved this architecturally: the RAG retrieval engine operates entirely within the client's Azure private cloud, with no outbound API calls permitted at the network level. Compliance is enforced by infrastructure, not just policy — making it auditable and demonstrably verifiable to EU regulators.

04

The Human Approval Gate

No AI-generated value is written to the legacy database without explicit human approval. The Data Card UI presents each extraction as a discrete, reviewable unit with its source citation, confidence score, and a binary Approve/Flag control. This gate generates the immutable audit trail Julian requires — every decision is timestamped, attributed to a named analyst, and stored as an append-only compliance record.

Section 06 · The Core Design Challenge

15 years of constraints.
We didn’t touch it.

The legacy database was built in 2009. It's been extended seventeen times since then by teams who are mostly no longer at the company. Every global insurance calculation Marsh McLennan runs touches it somewhere. The engineers were clear: nothing writes to it directly. Not the AI, not any new code. We had to design the entire approval workflow around that constraint — which meant the human approval gate wasn't just a trust feature. It was the only safe path to the system.

BEFORE · Legacy CIS System

MMC-EU-4471

847

SC2

2024

PEND

—

MMC-EU-4472

???

SC1

2024

ERR

PDF?

MMC-EU-4473

1,204

SC3

2023

DONE

xls

⚠ RECORD LOCK TIMEOUT — manual re-entry required. Source document reference lost.

AFTER · Aura Copilot Overlay

Extracted Data · MMC-EU-4471 Awaiting Approval

Scope 2 Emissions 847 tCO₂e / yr

Source Document Bill_Q3_2024.pdf · Page 7, Para 2

Sector Benchmark 740 tCO₂e · +14.3%

Confidence Score 94.2% ✓

CSRD Flag ⚠ Article 29b Threshold Exceeded

▶ Approve & Push to Legacy DB

✓ AUDIT LOG: 14:32:07 · Sarah M. · Approved · Source: Bill_Q3_2024.pdf

01

API Overlay Architecture — Zero Backend Risk

Aura sits entirely above the legacy database as a read/write API overlay. Reads extract data for AI processing. Writes only occur when a named analyst explicitly approves a structured data card. The legacy schema is never modified — we insert, never restructure. This was the single design decision that unlocked engineering buy-in within two sprint cycles.

02

Structured Data Cards — The Bridge Between AI and Legacy Schema

Rather than requiring analysts to manually translate AI outputs into legacy field formats, Data Cards are pre-mapped to the legacy database schema at design time. Each card field has a corresponding legacy DB column, validated on push. Analysts work in Aura's clean interface; the legacy system receives clean, schema-validated records. The translation layer is invisible to the user.

03

Progressive Adoption — The Desk-Level Change Management Strategy

We launched Aura in read-only mode for the first four weeks. Analysts could see AI extractions without any obligation to use them — removing adoption anxiety entirely. The write function was introduced only after trust was established through demonstrated accuracy. This phased approach drove 87% voluntary adoption in the first quarter without a single mandate from leadership.

Section 07 · Usability & Trust Testing

The dangerous failure
isn’t a wrong AI.

The thing I kept coming back to in research was this: the most dangerous thing isn't an AI that's wrong. It's an analyst who trusts a wrong AI without checking. Automation bias. We ran six weeks of Wizard of Oz testing — a human playing the role of the AI — specifically to find the moments where analysts would stop reading carefully. We found three. We fixed all three before a single model was trained.

Hallucination Detection Heatmap · Task T-03

High Dwell Source Click

ESG AUDIT REPORT · Client MMC-EU-4471 · FY2024

Annual Energy Consumption: 12,440 MWh

Scope 1 Direct Emissions: 312 tCO₂e

Scope 2 Market-Based Emissions: 893 tCO₂e 📎 p.7

Renewable Energy Ratio: 34.2%

CSRD Compliance Threshold: At Limit (Article 29b)

We planted a 5.5% error in the Scope 2 value — 893 instead of the correct 847 tCO₂e. Every single analyst caught it, because they clicked through to the source. That's what the explainability seam is for. Not decoration. Not compliance theatre. The thing that makes the system catchable when it's wrong.

Human Override Rate Over 4 Weeks · Post-Launch Declining = Calibrated Trust Growth

Human override rate declined from 68% in Week 1 to 14% in Week 4 — a signal of calibrated trust growth, not complacency. The target band is 10–20%, maintaining meaningful human oversight while indicating AI reliability.

4.8/5

AI-SUS Score
"I felt in control
of the outcome"

100%

Error Detection Rate
Injected 5% anomaly
n=12 analysts

87%

Voluntary Adoption
Rate in Q1
No mandate required

14%

Final Override Rate
Week 4 · Within
Target Trust Band

94%

Average AI Output
Confidence Score
in Production

🧙

Wizard of Oz Methodology — Testing Before Model Integration

Before the Azure OpenAI model was integrated, a human "wizard" simulated AI responses behind the Aura interface during testing sessions. Analysts believed they were interacting with the live AI. This allowed us to test the UI's trust calibration mechanics, the source citation interaction patterns, and the approval gate under realistic conditions — without any model hallucination risk. The WoZ sessions revealed that analysts required confidence scores to be visible within 2 seconds of query submission, a latency requirement we fed directly into the engineering SLA.

Section 08 · Results & ROI

What actually
changed.

65%

Reduction in Audit Processing Time
4.2h → 1.4h per case

Measured: Jira Time Tracking · n=47 audits

40%

Increase in Anomaly
Detection Rate

Measured: Compliance Review · 6 months

3×

Client Onboarding Throughput
No headcount increase

Measured: CIS Operations Report · Q3 2025

£0

Backend Infrastructure
Spend Required

Engineering Sign-off · API Overlay Only

Secure Enterprise Tool Stack

Every tool was mandated to be enterprise-licensed. Consumer-grade AI tools were explicitly prohibited by Legal — any tool processing client data required contractual GDPR compliance and EU data residency guarantees.

🎨

Figma AI Enterprise

UI Design · Prototyping

Used for all high-fidelity UI design. Enterprise license ensures design file data never transits Figma's consumer AI training pipeline. Variables & component tokenisation directly mirrors the Aura design system for handoff fidelity.

✓ GDPR · EU Data Residency

⚙️

ProtoPie Enterprise

AI Interaction Prototyping

Prototyped all AI interaction states — loading, confidence thresholds, error states, and the Wizard of Oz simulation layer. ProtoPie's sensor-driven interactions accurately simulated the latency and state-change patterns of the live Azure OpenAI integration before engineering build.

✓ ISO 27001 · SOC 2 Type II

🧠

Azure OpenAI

LLM · RAG Infrastructure

The underlying language model for the RAG engine. Critical distinction: Azure OpenAI's private deployment means The client's data is never used for model training and remains within the European Azure region. This was the only LLM provider that met Legal's contractual requirements.

✓ Azure EU Region · Data Isolation

📊

Dovetail Enterprise

Research Synthesis · AI Analysis

All user research recordings, interview transcripts, and contextual inquiry notes were synthesised in Dovetail's secure enterprise environment. Dovetail AI was used to cluster pain points and surface patterns across 240+ analyst touchpoints — a process that would have taken weeks manually.

✓ SOC 2 · No Data Training

🔬

Maze Enterprise

Usability Testing · Analytics

Quantitative usability testing at scale. The hallucination heatmap and trust calibration data were generated through Maze's analytics suite. AI-assisted session analysis surfaced the 2-second confidence score latency requirement from behavioural heatmap data that manual analysis would have missed.

✓ GDPR Article 5 Compliant

📐

Figma Variables API

Design System · Tokenisation

Aura's design system tokens (colours, spacing, typography scale) are stored as Figma Variables and exported via the Variables API directly into the engineering team's CSS custom properties. This eliminated the 2-week design-to-dev token reconciliation that previously caused visual regressions at every sprint boundary.

→ Direct Figma → CSS Handoff

Section 09 · The Design System Specification

Every element earns
its place or it goes.

I had one rule for every visual decision in the interface: does this reinforce trust, or does it communicate uncertainty? If it does neither, it doesn't belong in the screen. No decorative anything. The confidence bar isn't branding. The source citation isn't metadata. They're the product.

01 · Colour Palette · Figma Variables Ready

#040D1A

BG/Void

#070F1E

BG/Base

#0D1B2E

BG/Surface

#112240

BG/Elevated

#00D4FF

AI/Cyan·Glow

#0EA5D4

AI/Cyan·Mid

#0277A0

AI/Cyan·Dark

#E8F0FE

Text/Primary

#8892A4

Text/Secondary

#00D084

Semantic/Success

#FFAA00

Semantic/Warning

#FF4D6A

Semantic/Error

📐 Figma Variable Mapping: All 12 swatches map to CSS custom properties (--bg-void, --cyan-glow, etc.) via the Figma Variables API export. Semantic colours are defined as Figma Variable Aliases pointing to their base colour tokens, enabling theme switching without changing component references.

02 · Typography Scale · 8pt Grid Aligned

Display / Hero Aura System Syne 800 clamp(3–6rem)

Heading 1 Section Title Syne 700 clamp(2.2–3.75rem)

Heading 2 Card Title Syne 700 1.5rem / 24px

Body / Regular Paragraph text for analysis and descriptions across the interface. DM Sans 400 1rem / 16px

Body / Small Supporting text, captions, and secondary labels. DM Sans 400 0.875rem / 14px

Mono / Data 847 tCO₂e · 94.2% · MMC-EU-4471 JetBrains Mono 500 0.8125rem / 13px

Mono / Label SCOPE 2 EMISSIONS · GDPR JetBrains Mono 400 0.75rem / 12px

03 · Spacing Tokens · 8pt Grid System

sp-1
8px

sp-2
16px

sp-3
24px

sp-4
32px

sp-5
40px

sp-6
48px

sp-8
64px

sp-10
80px

04 · Component Library · Production States

Button / Primary

Button / Secondary

Button / Danger

Button / Ghost

Input / Default

Status Badges

Verified Pending Review Flagged AI Extracted CSRD Review

🗂 Figma Layout Blueprint: All components use Auto Layout with 8px base padding increments. Components are built with Variants covering: Default, Hover, Focused, Loading, Disabled, and Error states. The AI Confidence Score component has 5 Variants mapped to confidence bands: <60% (Error), 60–74% (Warning), 75–89% (Neutral), 90–97% (Success), 98–100% (Verified). Border radius uses a 6px / 12px / 20px token scale mapped to Component / Card / Modal hierarchy respectively.

Four hours a case.Every case. Every quarter.

Stakeholder Alignment Framework

Three people.Three definitions of “this works.”

I had to changethe process for this one.

Why the datanever leaves.

15 years of constraints.We didn’t touch it.

The dangerous failureisn’t a wrong AI.

What actuallychanged.

Secure Enterprise Tool Stack

Every element earnsits place or it goes.

Project
Aura

Four hours a case.
Every case. Every quarter.

Three people.
Three definitions of “this works.”

I had to change
the process for this one.

Why the data
never leaves.

15 years of constraints.
We didn’t touch it.

The dangerous failure
isn’t a wrong AI.

What actually
changed.

Every element earns
its place or it goes.