AustinWorks is a company focused on building AI infrastructure for the healthcare industry. They are seeking a Data Integration Engineer to design, build, and deploy production-grade data pipelines that transform raw healthcare data into reliable datasets for their AI platform.

Responsibilities:

Design and build ETL pipelines using PySpark, SQL, Python, and cloud data tools
Work with large, messy datasets from multiple client systems and transform them into production-ready formats
Build data validation frameworks to identify schema issues, data quality gaps, transformation errors, and performance bottlenecks
Troubleshoot complex integration problems across client data sources, internal systems, and production pipelines
Develop reusable data components and standardized integration patterns
Optimize distributed data processing workflows for scale and reliability
Partner with implementation, product, and infrastructure teams to move integrations into production
Translate partner requirements into clear technical specifications
Participate in technical conversations with client IT teams around schemas, access methods, and integration constraints
Document pipelines, transformation logic, and integration patterns

Requirements:

5+ years of experience in data engineering, data analytics engineering, solution delivery, or technical implementation
Strong PySpark experience, ideally working with large-scale distributed data processing
Advanced SQL skills for querying, transformation, and debugging
Production ETL experience, including error handling, monitoring, and data quality validation
Strong Python skills
Experience working with messy, inconsistent, or multi-source datasets
Ability to independently diagnose data quality, schema, and pipeline issues
Strong attention to detail and commitment to reliable production systems
Clear communication skills and comfort partnering with non-technical or semi-technical stakeholders
Scrappy, adaptable mindset; able to operate in ambiguous, fast-moving environments
Experience in consulting, technical implementation, solutions engineering, or forward deployed engineering
Startup experience, especially at a Series A/B company
Experience with Databricks or similar distributed data platforms
Experience with Azure, GCP, or modern cloud data infrastructure
Healthcare data experience, including EHR, claims, clinical, lab, or imaging data
Familiarity with Epic Clarity, FHIR, HL7v2, DICOM, LOINC, SNOMED, or ICD-10
Experience with Azure Databricks, Data Factory, Blob Storage, Delta Lake, Snowflake, or Fabric
Familiarity with Git, CI/CD, Terraform, or infrastructure-as-code
Experience with JSON, XML, Parquet, CSV, or other structured/semi-structured data formats

Data Integration Engineer

Key skills

About this role

Responsibilities:

Requirements: