Omar Al-Shammary
I build production AI/ML infrastructure that scales, saves money, and ships.
Serverless Data Platforms · Enterprise RAG Systems · MLOps · Cost-Optimized Cloud Architectures
Tri-State Area
$2M
AI infrastructure funding secured through technical leadership
95%
Infrastructure cost reduction — $24K to $90/month
1,100+
Daily users served, 220K+ annual queries at <1min latency

Work Experience

1

The Cigna Group
The Cigna Group

Machine Learning Engineer

Built production data pipeline processing 80K+ Protobuf files across 320+ DynamoDB tables on AWS (S3, Lambda, DynamoDB), writing Lambda functions for deserialization, transformation, and batch writes—reducing infrastructure costs by 95% ($24K/month to ~$90/month). Contributed to diagnosing DynamoDB hot partition issue through phase-level CloudWatch logging that identified uneven write distribution. Team refactored to composite partition keys, achieving 40% throughput improvement. Built observability infrastructure using CloudWatch and Athena for Lambda execution metrics, enabling data-driven optimization decisions that contributed to FY2026 executive budget approval.

Bloomfield, CT

Generative AI Engineer

Built ETL pipelines from 5 source systems (Snowflake, Databricks, Oracle, Confluence, GitHub) for MetagenAI, normalizing schemas into a common model and loading into Neo4j with OpenAI embeddings for hybrid retrieval—vector similarity plus graph traversal. Implemented golden questions drift detection framework that caught a real embedding drift issue before users were impacted. Contributed to $2M funding presentation that enabled AI department growth from 5 to 15+. Implemented ETL migration from Lambda to Glue, achieving 30% efficiency improvement and zero timeout incidents processing 100K+ records daily for 200+ engineering teams. Participated in critical production outage resolution within 2 hours alongside 100+ participants, contributing to root cause analysis that prevented $500K+ in claims processing errors.

Bloomfield, CT

Data Analyst

Built 4 Databricks pipeline jobs and FastAPI API endpoints for the Provider Inquiry Tool (PIT) serving 1,100+ daily users with 400K+ monthly queries at 69ms response time. Implemented DynamoDB for key-value lookups and OpenSearch for full-text search within the existing architecture. Built Splunk monitoring dashboards for API performance tracking. Contributed to cloud strategy initiatives achieving 15% reduction in data incidents and preventing $2.5M in budget overruns.

Bloomfield, CT

Data Analyst

Analyzed 2.5M+ longitudinal patient records using regression analysis and exploratory time series techniques to identify trends in post-COVID outcomes. Built predictive models using gradient boosting, segmented high-risk cohorts using K-means clustering and KNN classification—informing targeted interventions that lowered hospitalization rates by 10%. Led development winning 1st place in TECDP Summer Innovation Project 2022, recognized by CEO David Cordani during company-wide town hall, securing full-time return offer.

Bloomfield, CT

My Systemic Workflow

Scheduling & Orchestration

The Airflow DAGs on CollateralIQ handle claim records in batches across parallel DAG instances. Data flows from Oracle and Teradata into S3, validated at every step. The system runs on a schedule — no data lost, no records silently dropped. This is what let the team scale from Virginia to CT, NY, TX, and FL.

Masking Sensitive Data

PHI gets masked through AWS Comprehend Medical before any data reaches the LLM. Every decision is logged to PostgreSQL for audit trails. There’s a human-in-the-loop step — outputs below 0.7 confidence go to review before persisting. The LLM operates at temperature 0.1 for deterministic extraction. Walking through these safeguards across regulated healthcare documents is the difference between a production system and a liability.

Architecture With Purpose

The MetagenAI architecture I contributed to had clear separation: data ingestion, reasoning, review, reporting. Each layer works independently — when something breaks, you fix that piece without touching the rest. Schema drift detection in Neo4j, embedding drift monitoring with golden questions, and structured validation rules before anything persists. This clarity is part of why the project secured $2M in funding and the team grew from 5 to 15.

Everybody On the Same Page

On PIT, I built the Databricks pipeline jobs and API endpoints, tested across multiple environments before go-live. The system handles 1,100+ daily users with monitoring and alerting through Splunk and CloudWatch. Data synchronization improved from 24 hours to 2 hours. Issues get caught early through dashboards and automated alerts.

Selected Work

1

Education

2

University of ConnecticutUniversity of Connecticut

M.S, Quantitative Economics

Master's program focused on econometric modeling, statistical analysis, and quantitative methods. Developed strong foundation in data analysis and modeling techniques that directly translate to production ML systems and data platform engineering.

Storrs, CT
University of ConnecticutUniversity of Connecticut

B.S, Economics & Microbiology

Dual degree combining quantitative economics with biological sciences. Built analytical and research skills through coursework in statistical methods, data analysis, and scientific research methodologies.

Storrs, CT

Honors & Awards

2

Amazon Web ServicesAmazon Web Services

AWS Cloud Practitioner

Foundational AWS certification demonstrating cloud concepts, services, security, architecture, pricing, and support.

The Cigna GroupThe Cigna Group

CEO Recognition - 1st Place Innovation Project

Won 1st place in TECDP Summer Innovation Project 2022, recognized by CEO David Cordani during company-wide town hall for COVID-19 population health analytics work.

Credential
Aug 2022

Secured $2M AI Infrastructure Funding

Led MetagenAI project that drove AI department expansion for FY2025, demonstrating technical feasibility and ROI through full-stack implementation.

Credential
Jul 2024