Intelligent Insurance Claim Processing Agent — Agentic RAG for Auto Insurance

Intelligent Insurance Claim Processing Agent - Agentic RAG by Ali Zarreh

Explore the Intelligent Insurance Claim Processing Agent, a production-grade Agentic RAG (Retrieval-Augmented Generation) system that fully automates the evaluation of auto insurance claims. By combining semantic policy retrieval, CSV-backed validation, real-time web search for repair cost estimation, and dual interchangeable AI pipelines, this system delivers structured, auditable coverage decisions — transforming a labor-intensive manual process into a transparent, sub-second automated workflow.

Project Overview

The Claim Processing Agent ingests auto insurance claims as structured JSON, validates them against policy records, retrieves relevant policy language from a ChromaDB vector store, estimates market repair costs via DuckDuckGo web search, and produces structured ClaimDecision objects — all orchestrated by an AI agent. The system features two fully interchangeable pipelines: a deterministic LangChain/LangGraph state graph and an autonomous Smolagents agent, selectable via a single Hydra configuration change. With 58 automated tests, a FastAPI REST backend, a Streamlit UI with reasoning trace viewer, and Docker deployment, the project represents a complete production-grade agentic system built on modern AI engineering practices.

Technology Stack & Architecture

Core Technologies & Architectural Decisions

  • Dual Agentic Pipelines: LangChain/LangGraph stateful graph and Smolagents autonomous agent — fully interchangeable via Hydra config with no code changes required
  • Agentic RAG: ChromaDB vector store with OpenAI text-embedding-3-small embeddings for semantic retrieval of policy document chunks
  • REST Backend: FastAPI with Pydantic v2 request/response schemas, structured middleware for logging and error handling, and async Uvicorn serving
  • Interactive Frontend: Streamlit UI with claim form (manual and JSON input modes), coverage decision result card, and step-by-step agent reasoning trace viewer
  • Configuration Management: Hydra / OmegaConf for structured YAML config — all pipeline, LLM, vector store, and server settings externalized
  • Production Deployment: Multi-stage Docker builds with Docker Compose orchestration, environment variable injection, and health-check configuration
  • Code Quality: Poetry packaging, Ruff linting/formatting, 58 pytest tests, MkDocs documentation, and Loguru structured logging

AI Architecture Innovation

  • Deterministic LangGraph Orchestration: Eight-node StateGraph with explicit conditional edges encoding business logic — invalid policy and inflated cost checks cause deterministic graph branching
  • Autonomous Smolagents Alternative: ToolCallingAgent with custom system prompt enforcing the sequential workflow, offering a flexible autonomous alternative to the graph-based approach
  • Repair Cost Intelligence: Real-time DuckDuckGo web search estimates market repair costs per claim description, flagging inflated claims exceeding 40% above market estimates
  • Persistent Vector Store: Load-or-build pattern that detects an existing ChromaDB collection and skips re-embedding on subsequent runs, reducing startup overhead
  • Full Execution Tracing: Every graph node records its name, entry timestamp, and elapsed time in a trace list — surfaced in the Streamlit UI for complete observability
  • Pipeline Factory Pattern: Lazy-import factory reads cfg.pipeline.type and instantiates the correct pipeline, allowing runtime swapping with no dependency overhead

Technical Innovation & System Design

LangGraph Claim Processing Workflow

The LangChain pipeline implements a precise eight-node StateGraph that maps directly to the business workflow with 100% deterministic routing:

Processing Nodes & Conditional Edges:

  • parse_claim: Validates incoming JSON against the ClaimInfo Pydantic schema, surfacing malformed input early
  • validate_claim: Checks policy existence, premium dues status, and coverage date range in the CSV records — conditional edge routes to finalize_invalid on failure
  • check_policy: LLM generates 3–5 targeted semantic search queries from claim details; ChromaDB returns the most relevant policy text chunks
  • price_check: DuckDuckGo web search retrieves market repair cost estimates — conditional edge routes to finalize_inflated when the claimed amount exceeds market by more than 40%
  • generate_recommendation: GPT-4o synthesizes claim data, retrieved policy text, and cost validation into a structured PolicyRecommendation
  • finalize_decision / finalize_invalid / finalize_inflated: Three terminal nodes assembling the final ClaimDecision with appropriate coverage flags, deductible, payout, and notes

Smolagents Autonomous Pipeline:

  • ToolCallingAgent Orchestration: Five registered tools covering parsing, validation, query generation, retrieval, and recommendation — the agent autonomously decides execution order within guardrails
  • Custom Prompt Engineering: Structured system prompt enforces the sequential workflow, preventing the agent from skipping validation or retrieval steps
  • Schema Guardrails: Agent output is validated against the ClaimDecision Pydantic schema with one automatic retry on malformed responses
  • OpenAIServerModel Backend: Provider-agnostic LLM interface supporting GPT-4o-mini, with extensibility for Anthropic Claude or local Ollama models

Technical Excellence & Innovation Leadership

Measurable Technical Achievements

Delivered quantifiable results demonstrating advanced agentic AI engineering and insurance domain expertise:

  • 58 Automated Tests: Comprehensive pytest suite covering schemas, validation logic, retrieval accuracy, both pipeline implementations, and FastAPI endpoints
  • Dual Pipeline Architecture: Two fully interchangeable orchestration strategies — deterministic LangGraph graph and autonomous Smolagents agent — selectable without code changes
  • Fraud Detection via Web Search: Real-time DuckDuckGo integration flags inflated repair cost claims by comparing against market estimates, adding an AI-powered anti-fraud layer
  • Full Reasoning Transparency: Step-by-step agent trace surfaced in the Streamlit UI, enabling auditors and adjusters to inspect every decision the agent made
  • Sub-Second Policy Retrieval: Persistent ChromaDB load-or-build pattern delivers instant vector store access after the initial ingestion run

Architectural Excellence & System Design

Engineered a clean separation of concerns across all system layers:

  • Layered Architecture: Streamlit frontend communicates exclusively via REST API — no direct Python imports from the backend, ensuring true microservices separation
  • Abstract Pipeline Interface: BasePipeline(ABC) contract enforces a consistent process_claim() API across both pipeline implementations
  • Core Logic Isolation: Validation, ingestion, and retrieval are plain Python functions in src/claim_agent/core/ — framework-agnostic and directly unit-testable
  • Configuration-Driven Behavior: Hydra YAML controls pipeline type, LLM model, embedding model, chunk sizes, retrieval depth, and server settings — no hardcoded values anywhere
  • Structured Observability: Loguru logging with contextual fields (claim number, pipeline type, tool name, timing) for production-grade diagnostics

AI Engineering & LLM Orchestration

Pioneered practical agentic RAG patterns for insurance domain automation:

  • LLM-Powered Query Generation: GPT-4o generates 3–5 targeted semantic search queries from claim context, dramatically improving retrieval precision over keyword-based approaches
  • Grounded Recommendation Generation: Coverage recommendations are anchored to retrieved policy text chunks, preventing hallucination on legally sensitive insurance language
  • Multi-Framework Competency: Side-by-side implementations in both LangChain/LangGraph and Smolagents demonstrate deep familiarity with the current agentic AI ecosystem
  • Healthcare-Grade Pydantic Validation: All inputs and outputs enforced through Pydantic v2 models, catching schema violations at system boundaries before they propagate
  • Conditional Business Logic as Graph Edges: Insurance adjudication rules (policy validity, cost inflation thresholds) encoded as LangGraph conditional edges rather than buried in LLM prompts

Quality Assurance & Production Excellence

Comprehensive Testing Framework

  • 58 Automated Tests: Full pytest suite across schemas, claim validation, ChromaDB retrieval, LangChain pipeline, Smolagents pipeline, and FastAPI routes
  • Scenario-Based Validation: Four representative claim fixtures covering valid claims, expired policies, invalid policies, and inflated repair costs — ensuring all graph branches are exercised
  • Async API Testing: pytest-asyncio with httpx test client validates REST endpoint behavior including error handling and malformed input responses
  • Ruff Zero-Warning Codebase: Enforced import sorting, type annotation checks, and best-practice linting rules across the entire codebase
  • Makefile Automation: Single-command workflows — make test, make lint, make format, make ingest, make docker-build — streamlining the full development lifecycle

Engineering Excellence & Problem-Solving Innovation

Dual Pipeline Architecture with Single Interface

Challenge: Supporting two fundamentally different orchestration paradigms (explicit graph vs. autonomous agent) without forcing consumers to know which pipeline is active. Solution: Designed the BasePipeline(ABC) abstract interface with a process_claim() contract, a lazy-import pipeline factory reading from Hydra config, and completely independent tool implementations in each pipeline's subdirectory — zero shared mutable state between pipelines.

Repair Cost Fraud Detection

Challenge: Auto insurance claims frequently include inflated repair cost estimates. Detecting fraud without access to internal adjuster databases requires creative tooling. Solution: Integrated DuckDuckGo web search as a real-time market intelligence tool — the agent searches for repair cost estimates matching the claim's vehicle and damage description, then flags claims where the submitted estimate exceeds market rates by more than 40%, triggering the finalize_inflated graph path.

Deterministic Agent Tracing for Auditability

Challenge: Insurance coverage decisions require full audit trails — "the AI said so" is not acceptable in regulated environments. Solution: Every LangGraph node appends a structured entry to the trace list in ClaimState, recording node name, timestamp, inputs, and elapsed time. The FastAPI response includes the full trace, and the Streamlit trace viewer renders it step-by-step, giving adjusters a complete view of every decision the agent made.

Framework-Agnostic Core Logic

Challenge: Keeping domain logic (policy validation, PDF ingestion, vector retrieval) portable across both pipeline frameworks without code duplication. Solution: Implemented all business logic as plain Python functions in src/claim_agent/core/ — no LangChain or Smolagents imports — then wrapped them with framework-specific tool decorators in each pipeline's tools.py. This allows the core logic to be unit-tested in complete isolation from the agent frameworks.

Flexible Configuration & Tuning

Every aspect of the system is configurable through Hydra YAML, enabling rapid experimentation without code changes:

ParameterConfig KeyDefaultEffect
Pipeline typepipeline.typelangchainSwitch between LangGraph and Smolagents
LLM modelllm.modelgpt-4o-miniSwap to gpt-4o, Claude, Ollama, etc.
Temperaturellm.temperature0.1Lower = more deterministic decisions
Max tokensllm.max_tokens4096Cap on recommendation length
Embedding modelvectordb.embedding_modelall-MiniLM-L6-v2Swap for OpenAI or other embeddings
Chunk sizevectordb.chunk_size500Larger = more context per retrieval chunk
Retrieval depthvectordb.n_results5More results = richer policy context
Inflation thresholdpipeline.inflation_threshold0.40Flag claims exceeding market estimate by this %

Scalability Vision & Future Roadmap

Planned Enhancements

  • Multi-Line Insurance Support: Extend beyond auto insurance to home, life, and health claims with domain-specific validation rules and policy corpora
  • LLM-as-Judge Evaluation: Automated scoring of coverage recommendation quality — groundedness, completeness, and regulatory compliance — for continuous quality monitoring
  • Human-in-the-Loop Escalation: Configurable confidence thresholds that pause the pipeline and route borderline claims to a human adjuster queue before finalizing decisions
  • Structured Output Streaming: Server-Sent Events (SSE) from FastAPI for real-time step-by-step trace streaming to the Streamlit UI during long-running agent runs
  • Multi-Modal Claim Intake: Accept photo evidence and repair shop invoices alongside JSON claims, using vision models to cross-validate damage descriptions against submitted images
  • Enterprise Auth & Multi-Tenancy: JWT-based authentication with per-tenant policy corpora, enabling deployment across multiple insurance carriers from a single platform

Leadership Impact & Technical Excellence

The Intelligent Insurance Claim Processing Agent represents a complete demonstration of production-grade agentic AI engineering — from LangGraph state machine design and ChromaDB RAG pipelines to FastAPI backend development, Streamlit UI construction, and Docker deployment. The dual-pipeline architecture, with both a deterministic LangGraph graph and an autonomous Smolagents agent sharing the same interface, showcases deep familiarity with the modern agentic AI ecosystem and the engineering discipline to keep complex systems clean and testable.

The project's emphasis on auditability and transparency — full execution traces, Pydantic-validated outputs at every boundary, structured logging, and a reasoning trace viewer in the UI — reflects the real-world requirements of deploying AI in regulated industries where "the model decided" is never a sufficient explanation. Every architectural decision prioritizes explainability alongside automation.

Quantified Technical Achievements: 58 automated tests across five test modules, two fully interchangeable agentic pipelines, real-time web search fraud detection, sub-second vector store retrieval, and single-command Docker deployment. This project exemplifies the intersection of agentic AI architecture, software engineering rigor, and practical insurance domain expertise.

Technical Resources & Documentation

Comprehensive technical implementation showcasing production-grade agentic RAG architecture and insurance claim automation system design:

Live Demo View Source Code Technical Documentation