← Our Work·Federal Technology PartnershipActive · May 2024 – PresentOption Year 2 Commencing

AI-Enabled Clinical NLP for Federal Regulatory Review

A responsible, Human-in-the-Loop AI workflow that helps FDA reviewers identify candidate SNOMED CT terms while keeping the reviewer as final decision maker: fully auditable, and built without custom model training. Delivered as a working Proof of Concept during Option Year 1 of the parent contract, which is now entering Option Year 2.

Responsible AIClinical NLPDigitizationFederal ModernizationHuman-in-the-LoopHealthcare AIWorkflow Automation

HITL

Human-in-the-Loop

Zero

Custom Model Training Required

Top 5

Ranked SNOMED Candidates per Entity

Person Delivery Team

The Challenge

A Manual Process at the Heart of Regulatory Review

At the FDA, clinical reviewers were responsible for mapping unstructured medical text from pharmaceutical submissions to standardized SNOMED CT terminology codes. The workflow was entirely manual, dependent on specialized clinical knowledge, slow to execute, and difficult to standardize.

The result: a bottleneck at the center of mission-critical regulatory workflows, with no scalable path forward and growing pressure to modernize without compromising accuracy or oversight.

✗

Subject matter experts manually interpreted thousands of unstructured clinical indications from regulatory submission forms, a process entirely dependent on scarce specialist knowledge

✗

Manual SNOMED CT terminology searches created a real bottleneck in mission-critical regulatory review workflows, one query at a time

✗

Manual classification introduced variability in outcomes: different reviewers could reach different mappings for the same clinical text

✗

No standardized scoring or audit trail meant outputs were difficult to validate, trace, or defend in a regulatory context

✗

Scaling the process required adding more expert reviewers: an expensive, slow, and unsustainable path

Our Approach

Evaluate First. Build with Confidence.

We didn't start by building. We started by understanding, evaluating multiple AI approaches before committing to a path aligned with the agency's constraints and mission.

Phase 1

Discovery & Technology Evaluation

We evaluated three categories of AI before choosing: (1) Transformer-based LLMs including BERT, rejected because they reason over a narrow span of text rather than the full context behind a clinical indication, and struggled with this specialized domain's knowledge-driven reasoning; (2) Classical ML (Support Vector Classification over transformer embeddings), rejected because there wasn't enough historical adjudication data to reliably align indications to the correct SNOMED CT codes; (3) Domain-specialized clinical NLP services, selected. Amazon Comprehend Medical was purpose-built for clinical entity extraction and avoided the need for custom model training, annotation, or ongoing maintenance.

Phase 2

Proof of Concept & Compliance Validation

We ran a controlled PoC against restricted datasets, documenting traceability and having outputs verified by agency reviewers. This phase confirmed accuracy, interpretability, and alignment with federal acquisition standards before any broader rollout.

Phase 3

Architecture Handoff & Ongoing Advisory

With PoC results validated by agency reviewers, elfOvations delivered full architecture documentation, integration specifications, and a working prototype as the blueprint for the agency's delivery team. The Proof of Concept was substantially delivered during Option Year 1 of the parent contract, which is now entering Option Year 2.

“This phased, acquisition-aligned approach ensured responsible, transparent AI adoption, delivering meaningful operational improvements while maintaining the governance and accountability required in a federal regulatory environment.”

Ashish Nagpal, Chief Architect, FDA Program

The Solution

An Intelligent, Human-Centered Adjudication Workflow

We built an AI-enabled SNOMED CT adjudication application that accelerates the reviewer's work without replacing their judgment.

Clinical Entity Extraction

Amazon Comprehend Medical analyzes unstructured clinical text and extracts medical entities (conditions, procedures, anatomy, medications) with associated confidence scores.

SNOMED CT Terminology Integration

Real-time integration with SNOMED CT Terminology APIs validates and enriches extracted concepts dynamically, mapping entities to standardized medical codes.

Weighted Confidence Ranking

A composite scoring algorithm combines entity confidence and concept confidence to surface the most relevant SNOMED CT candidates: ranked, explainable, and auditable.

Human-in-the-Loop (HITL) Controls

AI functions strictly as an advisory capability. All final SNOMED CT selections remain with trained human reviewers, preserving accountability and regulatory defensibility.

Architecture Handoff & Integration Design

Delivered full architecture documentation, data flow specifications, and integration design for the agency's delivery team to carry into production, including a working prototype as a reference implementation.

Explainable, Auditable Outputs

Every AI recommendation includes its confidence score, source entities, and ranking rationale, making outputs transparent and traceable, and designed to support OMB M-24-10 federal responsible AI requirements.

See how this applies to your program →

The Impact

Faster. More Consistent. Fully Auditable.

Top 5

Ranked SNOMED candidates per entity

Each candidate carries an entity-extraction and SNOMED-concept confidence score

Custom model training

No annotation, no retraining, no ongoing ML maintenance required

100%

Human oversight retained

AI surfaces candidates; reviewers decide. Every time.

Person delivery team

Solution architecture, full-stack, AWS/AI, QA, business analysis, and coordination roles

How the ranking works

Amazon Comprehend Medical extracts clinical entities from submission text and returns candidate SNOMED CT concepts with confidence scores. A weighted-ranking model combines entity-extraction confidence with SNOMED-concept confidence to surface the top 5 candidates per entity, ranked and scored. The reviewer works from that shortlist and makes the final Primary and Secondary term selections; nothing is auto-adjudicated.

Top 5

ranked candidates · per entity

+ confidence score

Reviewer

makes the final call

AWS Partnership Impact

When Amazon Comprehend Medical's entity ranking didn't always surface the most clinically relevant term first, our team escalated directly with AWS through multiple working sessions rather than working around the limitation. The ranking behavior improved as a result.

Responsible AI in Practice

Every AI recommendation in this system includes its confidence score, source entities, and ranking rationale. The AI never decides; it advises. This design reflects our commitment to building AI that augments human expertise rather than replacing it.

Technologies Used

The Stack Behind the Solution

AI / NLP

Amazon Comprehend Medical
SNOMED CT Terminology APIs
Custom NLP Orchestration (Python)
Weighted Ranking & Confidence Scoring

Platform

Python / Django
SQLite
REST APIs
Figma (End-to-End Mockups)

Responsible AI & Governance

OMB M-24-10 Responsible AI Alignment
Human-in-the-Loop (HITL) Architecture
Explainable, Confidence-Scored Output
Weighted SNOMED Concept Ranking

For Federal Program Managers

Federal-Specific Considerations

How this capability gets procured, authorized, and sustained in a federal environment.

Procurement Path

This capability was delivered as a technology subcontract through an established federal prime contractor on an active task order. It is available to prime contractors as an AI/NLP subcontract under NAICS 541512 or 541511. Open market delivery is also available for agencies with direct contract authority. Explore teaming →

Responsible AI & OMB M-24-10 Alignment

Every AI recommendation carries a confidence score, source entities, and ranking rationale, and final SNOMED term selection always rests with the human reviewer. This Human-in-the-Loop design is built to support OMB M-24-10 responsible AI requirements, giving contracting officers and ATO reviewers a documented human-decision-authority story from day one.

Agency Adoption Lessons

A phased rollout was essential: PoC first with a restricted dataset, validated by agency SMEs, before any broader deployment. This approach built trust with reviewers who were initially skeptical of AI-assisted classification. The key adoption lesson: show the AI as a tool that makes their expertise more effective, not a system that replaces their judgment.

Have a Similar Challenge?

Whether it's a manual process that needs AI, a legacy workflow that needs modernization, or a compliance requirement that needs architecture. We'd love to hear about it.

Speak with Shweta →Explore Teaming →Start a Conversation See All Our Work →