Ingress/Case Studies/Healthcare Data Pipeline

Three EMRs. One clean data layer.

A regional healthcare network across 4 hospitals and 18 clinics managed patient data in three separate EMR systems while billing and scheduling lived elsewhere. We built a HIPAA-compliant data engineering platform on Snowflake, with Apache Airflow orchestration, dbt transformations, and intelligent PHI tokenization, reducing reporting latency from 72 hours to 4 hours.

Healthcare ยท Data Engineering 8 months Airflow ยท dbt ยท Snowflake
Pipeline Uptime
99.97%
Reliable orchestration across EMR systems
Reporting Latency
72 hrs โ†’ 4 hrs
From daily batch to near-real-time
Context

Patient data, finally unified.

The health network had grown through acquisition and merger, leaving patient data scattered across three different EMR platforms. Clinical staff struggled to access complete patient history, while analytics teams couldn't answer basic questions about network performance, patient outcomes, or operational efficiency because data lived in silos.

Additionally, billing data sat in a separate system, and scheduling information was yet another platform. Reporting could only happen through manual exports and Excel consolidation, taking three days. The organization needed a unified, HIPAA-compliant data foundation that could support both operational reporting and clinical analytics.

  • Three EMR silos. Each hospital or clinic group operated on a different system with no patient record linkage across locations.
  • Fragmented operational data. Billing and scheduling were separate systems, making revenue cycle and capacity analysis impossible at enterprise level.
  • Manual reporting process. Analytics required 72-hour turnaround, limiting the organization's ability to respond to trends or performance issues.
  • HIPAA compliance complexity. Any data integration solution had to protect PHI, maintain audit trails, and meet encryption and access control requirements.
Approach

How we built it.

We analyzed all source systems, designed FHIR-compliant mappings, built Airflow DAGs to orchestrate nightly extractions, layered dbt transformations to build dimensional models in Snowflake, and added a PHI tokenization layer for privacy and audit compliance.

01.
EMR Source Analysis & FHIR Mapping
Audited all three EMR systems, mapped patient, encounter, clinical, and billing entities to FHIR standards. Designed deduplication logic to resolve patients across systems.
HL7 FHIRData Modeling
โ†’
02.
Airflow Orchestration & S3 Landing Zone
Built Apache Airflow DAGs to extract from three EMR systems nightly, land raw data in AWS S3, with retry logic, error alerting, and data quality checks. Scheduled workflows for consistent timing.
Apache AirflowAWS S3
โ†’
03.
dbt Transformations & Snowflake Warehouse
Built dbt transformation layer to apply FHIR logic, create patient bridges, normalize clinical and billing data, and build dimensional fact tables for analytics. Deployed to Snowflake DW.
dbtSnowflake
โ†’
04.
PHI Tokenization & HIPAA Validation
Implemented tokenization for all PHI (MRN, patient names, dates, SSN), maintained mapping tables, configured row-level security, and passed HIPAA audit. Built compliance reporting dashboard.
HIPAA ComplianceTokenization
โ†’
Outcomes

What it delivered.

99.97%

Pipeline reliability

Automated nightly orchestration with failover and retry logic. Analytics team no longer manually consolidates data. Infrastructure is self-healing with SLA monitoring.

4 hours

Report latency

From 72-hour manual consolidation to 4-hour automated pipeline. Clinical and operations teams now make decisions on fresh data instead of week-old snapshots.

100% PHI

Compliance covered

All personally identifiable information tokenized with mapping controls. HIPAA audit passed. Access controls and encryption in place for all data flows.

Tech Stack

What we used.

๐Ÿ”„

Apache Airflow

Workflow orchestration for EMR extractions, scheduling, retry logic, and error alerting. DAGs ensure consistent, repeatable nightly data pipelines.

๐Ÿ“Š

dbt (Data Build Tool)

SQL-first transformation framework for FHIR mappings, patient deduplication, dimensional modeling, and data quality. Lineage tracking for compliance audits.

โ„๏ธ

Snowflake

Cloud data warehouse hosting unified patient, clinical, and billing dimensions with instant scalability and zero-copy cloning for secure test environments.

๐Ÿ”

Python & Tokenization

Data transformation logic, PHI encryption and tokenization, HL7 FHIR processing with pandas and PySpark. Full HIPAA audit trail maintenance.

Start a conversation

Tell us what's worth doing.

// 30 minutes โ†’ a written brief.

Bring the problem. We'll come back with a written brief, what to build, what to defer, and where AI actually moves the number. No deck pitches.

Emailconnect@ingressits.com
GSA MAS#47QTCA26D000K
Reply< 24 hrs