Gradient Drift

Version: 1.0 Updated: Nov 2025 Request Access Open Demo

Project 1 - RAG Search Assistant

Executive Summary

Aurenis, an under-performing Healthcare company, has faced 3 consecutive quarters of stagnation. Their recent lackluster performance is due in large part to the emergence of fierce new rivals who have made market conditions significantly more competitive. This has been borne out by the sudden loss of two of the business’s largest customers in quick succession. While both factors have had a devastating impact on Aurenis’ once-dominant market position, this alone does not represent the complete picture. For quite some time, Aurenis’s staff on the ground have been battling against a variety of operational challenges that have made it increasingly difficult for them to maintain the level of quality and output that had been a hallmark of the business to date.

Despite being notoriously risk-averse when it comes to embracing new technologies, the business imperative coupled with the arrival of an ambitious new CTO has compelled the firm to take immediate action. The company has decided to break with tradition and implement the latest AI enterprise software across their customer-facing teams. A new RAG Search Assistant tool will be aggressively rolled out among the Sales, Support and Operations Teams with the stated aim of shortening the time it takes to close new business, onboard customers and resolve customer issues, which in turn is expected to lead to vastly superior CSAT scores. It is hoped that the initiative will not only drive growth but propel Aurenis back toward the market-leading position it once held for many years.

Context

Aurenis, a medium-sized Healthcare company with a presence in 4 markets is facing a major inflection point. The business has consistently failed to meet the ambitious targets set by the Leadership team and investor confidence is at an all-time low, with the company share price veering towards terminal decline. If the business climate was unfavourable on a macro level, the sentiment among staff operating at the grassroots was decidedly worse.

A key bone of contention among workers in customer-facing roles has been the company’s over-reliance on outdated processes and systems leading them to experience constant burnout and the feeling of having to run on a perpetual treadmill merely to stand still. Customer service excellence is the company’s core value. It remains a deeply rooted part of its culture. Employees derive much of their self-worth from their ability to consistently over-deliver at speed, and so they feel deeply frustrated about having to spend a significant proportion of their time continuously trawling through Slack threads, Jira tickets and internal Confluence pages just to be able to locate the information they need to support customers. In fact, a recent audit revealed that over 200 hours a week are dedicated to this very task alone.

With key customers defecting to competitors due to the perception that they can be better served elsewhere, Aurenis’ newly-appointed CTO has identified this as a key area to address in order to reverse the company’s fortunes, and on a more personal level, make a major impact early on in his tenure.

Problem Statement

Fundamentally this project seeks to explore the extent to which the firm can reduce wasted time spent locating knowledge across tools, while improving employee morale and customer satisfaction.

In addition, the project aims to investigate the following key questions:

Together, these questions define the scope and criteria by which the success of the RAG Search Assistant will be evaluated.

Success Criteria

The success of this project will be determined by the following criteria:

  1. A fully secure system only accessible to authorised users.
  2. A user-friendly interface with low latency which encourages fast adoption.
  3. A stable and dependable resource that staff can consistently rely on.
  4. High levels of factual accuracy and contextual awareness in responses.
  5. The ability to tailor answers precisely to the user’s question and intent.

These criteria will guide how the project’s impact is assessed following implementation.

Functional Requirements

Non-Functional Requirements

Feature-to-Requirement Mapping

Requirement Mapped Feature / Service Category
User Authentication Amazon Cognito Hosted UI (PKCE) Security
Secure API Access API Gateway JWT Authorizer Security
Rate Limiting & DDoS Protection CloudFront + AWS WAF Security
Data Encryption (In Transit) HTTPS via CloudFront + API Gateway Security
Data Encryption (At Rest) Pinecone Serverless + S3 Default Encryption Security
Key Rotation KMS-managed encryption Security
NL Querying Bedrock Embeddings + RAG Pipeline Core Functionality
Grounded Answers Pinecone Vector Database Core Functionality
Low Latency Lambda + Bedrock (Claude Sonnet) Performance
Content Delivery CloudFront CDN Performance
Serverless Scaling Lambda Scalability
Internal Throughput API Gateway HTTP API Scalability
Minimal Management Fully Serverless Architecture Ops
Simplified Deployment S3 Static Hosting + Lambda Packaging Ops
Storage Pinecone + S3 Storage

Solution Overview

This section outlines the system design and implementation strategy for the RAG Search Assistant, highlighting architectural decisions, functional workflows, and non-functional considerations.


Key Architectural Decisions

Data Source & Embeddings

Vector Store

Model Strategy

Deployment Strategy


Functional Flow

Phase 1: Data Processing

The following diagram outlines the embedding + indexing workflow:

Data Processing Pipeline

Workflow Summary

Phase 2: Retrieval and Generation

The diagram below visualises the runtime flow of query resolution:

Retrieval Flow Diagram

Workflow Summary
  1. User submits a query.
  2. App receives and processes query.
  3. Query is embedded → Pinecone top-k retrieval.
  4. Retrieved chunks + query → grounded prompt.
  5. Prompt sent to Bedrock.
  6. Model (e.g., Claude Sonnet) generates response.
  7. Response returned to app.
  8. App returns final answer.

Architectural Sequence & Runtime Architecture

The complete runtime architecture is illustrated below, showing how user identity, front-end routing, application logic, AI inference, and observability layers interact:

Runtime Architecture Diagram

Identity

Edge

Application

Data & AI

Observability & Cost Control


Final Notes

This architecture reflects a scalable, low-maintenance deployment strategy that balances security, performance, and usability. All decisions were made with real-world adoption, developer efficiency, and measurable business impact in mind.

Implementation

Deployment Roadblock: Python Packaging on macOS for Lambda

Surprisingly, the most challenging part of implementation turned out to be deploying the application to AWS Lambda. The process of uploading the application code from my local machine to Lambda took well over an hour, before continually failing with an Unable to import module 'backend.main': No module named 'pydantic_core._pydantic_core' error message.

This was incredibly frustrating — so much so that I was strongly considering opting for either EC2 or Fargate instead, though I ultimately decided to see it through as serverless compute was identified as a key requirement during discovery.

The issue stemmed from a mismatch between Lambda’s Python 3.12 runtime, which runs on Amazon Linux (x86_64), and my local macOS development environment (ARM64). Core Python packages such as NumPy, Pandas, and Pydantic contain compiled binaries, and therefore must be built on a compatible architecture to run successfully on Lambda.

Thankfully, the solution was relatively straightforward. I spun up a temporary Linux-based EC2 instance specifically to recompile Pydantic and the other impacted libraries. This appears to be a well-known workaround when working with compiled dependencies on Lambda, though it is not always clearly documented. Aside from a few inevitable hiccups assigning the correct IAM role and policies, I was able to repackage the libraries and upload them to Lambda without further issues.

For any Mac-based developer working with compiled Python libraries, this architecture mismatch will require a Linux build environment — whether via EC2, Docker, or another compatible approach.


Attaching WAFs to CloudFront: The us-east-1 Rule and Its Side Effects

The setup of the Web Application Firewall (WAF) presented the next major implementation hurdle. The first issue arose when I mistakenly created a regional WAF in eu-west-2 (London). However, CloudFront only supports global WAFs created in us-east-1 (N. Virginia). As a result, any attempt to attach the WAF triggered InvalidParameterException errors. Matters were made worse by the AWS Console showing the WAF as successfully attached, while the CLI reported it as unattached — causing confusion and delaying debugging efforts.

The solution was to recreate the Web ACL in us-east-1 with Scope=CLOUDFRONT, and attach it via the CloudFront Console rather than the CLI, which helped avoid further validation issues. The core CloudFront learning here is that all WAFs must be created and managed in us-east-1 to apply globally; regional WAFs will consistently fail to attach.

Unfortunately, the complications did not end there. Because my rate-limiting rule (100 requests per 5 minutes per IP) lived inside the same WAF, it was also unable to take effect when the WAF failed to attach. This went unnoticed for a considerable period, as unrelated 403 errors created the illusion that the rule was functioning when, in reality, it was not active at all.

Ultimately, the rate-limit rule began functioning automatically once the WAF was properly attached in the correct region.


When Authentication Isn't the Problem: Debugging a CloudFront Red Herring

Authentication proved to be by far the most difficult aspect of the entire implementation, as it contained a myriad of issues which, on multiple occasions, felt insurmountable.

  1. Hosted UI: Missing ‘Sign-in’ Behaviour
    The first major blocker came from AWS Cognito’s Hosted UI being completely unresponsive when clicking on the “Sign in” button. This was due to importing the browser build of oidc-client-ts instead of the correct ESM module. The issue was resolved by switching to the proper ESM import:
    
    import { UserManager }
      from "https://cdn.jsdelivr.net/npm/oidc-client-ts@2.0.4/dist/esm/oidc-client-ts.min.js";
        
  2. Callback Page Not Working / Login Loop
    After redirecting from Cognito’s hosted authentication journey, the callback page either did nothing or displayed Error: no authorization code returned. The root causes were incomplete handling of the OAuth code parameter and a slightly incorrect callback path. Switching from a raw token exchange script to the managed userManager.signinCallback() function resolved the issue.
  3. Invalid Scope Error
    A recurring error occurred where sign-ins redirected back to the callback page with:
    
    error=invalid_request&error_description=invalid_scope
        
    This was caused by the Cognito App Client lacking the profile scope — only email and phone had been selected. OIDC requires openid plus at least one of email, profile, or phone. Adding the profile scope resolved the issue fully.
  4. Incorrect Front-End Behaviour Misdiagnosed as Auth Failure
    At this point, it seemed like the entire authentication flow was cursed: the sign-in button didn’t work, the UI didn’t update, redirects failed, and Cognito occasionally displayed error messages. The behaviour was inconsistent and unpredictable, giving the impression of a fundamentally broken authentication system.

    In reality, the authentication pipeline was implemented correctly. The real issue was that the front end was serving a stale JavaScript file. S3 object overwrites were silently failing, causing CloudFront to cache outdated assets. Once the correct JS bundle — with the proper redirectUri, metadata, and module imports — was deployed, the entire flow snapped into place.

    Ultimately, the outdated JS prevented event listeners from attaching, blocked UI updates, and stopped the PKCE flow from triggering. Cognito was throwing errors only because the old bundle was passing malformed redirect URIs. What looked like OAuth failures were, in reality, front-end deployment issues in disguise.

This experience was a powerful reminder that in distributed systems, what appears to be an authentication failure may actually be a front-end deployment issue.

Testing & Results

RAG Search Assistant — UI Snapshot

The screenshot below shows the final user interface for the RAG Search Assistant as deployed during testing. This capture reflects the production-ready layout, authentication controls, model selector, Top-k parameter, and query input surface used to perform evaluation.

RAG Search Assistant UI Screenshot

Reflections & Lessons Learned

Security as a First Class Concern

Having previously walked away from many projects due to technical stalemates or shifting scope, it was deeply satisfying to see this project through and deliver a tangible product that met the majority of its goals. Fortunately, I was guided by a well-defined process that I have iterated and improved over time, which helped me anticipate critical issues and mitigate them early on.

That said, there were clear opportunities to improve how security was handled. While it was a key consideration at the start of the project, in hindsight it should have been baked into every feature rather than treated as a parallel stream. Many of the implementation challenges ultimately touched on security in one form or another.

Going forward, I’ll take greater care to analyse every component through the lens of its attack surface and potential threat vectors. I also intend to incorporate more rigorous testing and security hardening during design and delivery, in order to avoid the unacceptable outcome of a vulnerability slipping into production.

RAG and the Future of AI at Work

The RAG Search project provided an excellent opportunity to explore some of the core principles businesses grapple with when adopting AI into their organisations. Retrieval-Augmented Generation is clearly a critical enabler for businesses looking to leverage LLMs in ways that reflect internal context and data, while also reducing many of the reliability concerns that typically accompany this technology.

Natural language interfaces are emerging as a central pillar of the AI-native workplace. They offer a powerful, intuitive entry point for employees to perform higher-quality work at speed. In today’s climate, it’s difficult not to capture the interest of business leaders when articulating the concrete, near-term benefits of these tools.

One development I anticipate is the arrival of AI Enablement Officers — a new kind of internal role responsible for helping workforces adopt AI tools effectively. While these technologies are incredibly powerful, they are not self-explanatory. Great care must be taken not only to train users, but to foster a culture of responsible AI usage that acknowledges the risks of over-reliance and the potential loss of personal agency it can bring.

What’s Next

For my next project, I intend to explore many of the themes raised by the RAG Search Assistant in greater depth. I am especially keen to take any available opportunity to better understand how business leaders view LLM adoption, and to identify the practical solutions that enable responsible implementation at scale.

While RAG features heavily in today’s AI zeitgeist, Agentic Orchestration represents the other half of the enterprise AI equation. Where RAG Search helped clarify how businesses can understand what is happening inside their organisations, my next project will focus on Enterprise Execution — specifically, how AI can help organisations act intelligently across systems. This is where agentic systems come into play.

In addition, I am interested in exploring open-source solutions such as LLaMA, DeepSeek, Mistral, and Falcon. I also intend to explore Azure OpenAI, with a view to creating a multi-cloud solution spanning both AWS and Azure. This will provide an excellent opportunity to demonstrate how businesses can safely interact with AI systems in a way that addresses organisational concerns around security, data sovereignty, and compliance.

Request Access to RAG Search Assistant

If you're interested in exploring the live app, complete the form below and we’ll send you an invite.

By submitting this form, you agree that your information will be used solely to review your access request. You may request data deletion at any time by contacting us.