Aretum is a mission-driven organization committed to delivering innovative, technology-enabled solutions to customers across various sectors. They are seeking a skilled and highly motivated Data Engineer to build and manage data ingestion, transformation, reconciliation, and analytics pipelines.
Responsibilities:
- Ingest data from FHIR APIs, CDW, and other VA sources
- Normalize and reconcile medication and patient data
- Build transformation pipelines for risk scoring inputs
- Support batch and near-real-time processing
- Ensure data quality, consistency, and traceability
Requirements:
- Programming: Python (primary), SQL (advanced), optional Scala
- Data Processing Frameworks: Apache Spark, AWS EMR, Databricks (preferred)
- ETL/ELT Design: Pipeline orchestration, incremental vs full loads, data validation
- API Integration: REST APIs, JSON parsing, pagination, authentication (OAuth2)
- FHIR Data Handling: Patient, MedicationRequest, Observation, etc
- Data Modeling: Relational and semi-structured schema design
- Data Quality & Validation: Deduplication, reconciliation logic, anomaly detection
- Streaming vs Batch Processing: Understanding tradeoffs and implementation patterns
- Storage Technologies: S3, relational DBs, NoSQL basics
- Performance Optimization: Partitioning, parallelization, query tuning
- Versioning & Lineage: Data version control, reproducibility of datasets
- U.S. Work Authorization: Due to federal contract requirements, only U.S. citizens are eligible for this position
- Ability to obtain and maintain a Public Trust or Suitability Determination, depending on the agency's background investigation requirements
- Data Processing Frameworks: Apache Spark, AWS EMR, Databricks