AustinWorks is a company focused on building AI infrastructure for the healthcare industry. They are seeking a Data Integration Engineer to design, build, and deploy production-grade data pipelines that transform raw healthcare data into reliable datasets for their AI platform.
Responsibilities:
- Design and build ETL pipelines using PySpark, SQL, Python, and cloud data tools
- Work with large, messy datasets from multiple client systems and transform them into production-ready formats
- Build data validation frameworks to identify schema issues, data quality gaps, transformation errors, and performance bottlenecks
- Troubleshoot complex integration problems across client data sources, internal systems, and production pipelines
- Develop reusable data components and standardized integration patterns
- Optimize distributed data processing workflows for scale and reliability
- Partner with implementation, product, and infrastructure teams to move integrations into production
- Translate partner requirements into clear technical specifications
- Participate in technical conversations with client IT teams around schemas, access methods, and integration constraints
- Document pipelines, transformation logic, and integration patterns
Requirements:
- 5+ years of experience in data engineering, data analytics engineering, solution delivery, or technical implementation
- Strong PySpark experience, ideally working with large-scale distributed data processing
- Advanced SQL skills for querying, transformation, and debugging
- Production ETL experience, including error handling, monitoring, and data quality validation
- Strong Python skills
- Experience working with messy, inconsistent, or multi-source datasets
- Ability to independently diagnose data quality, schema, and pipeline issues
- Strong attention to detail and commitment to reliable production systems
- Clear communication skills and comfort partnering with non-technical or semi-technical stakeholders
- Scrappy, adaptable mindset; able to operate in ambiguous, fast-moving environments
- Experience in consulting, technical implementation, solutions engineering, or forward deployed engineering
- Startup experience, especially at a Series A/B company
- Experience with Databricks or similar distributed data platforms
- Experience with Azure, GCP, or modern cloud data infrastructure
- Healthcare data experience, including EHR, claims, clinical, lab, or imaging data
- Familiarity with Epic Clarity, FHIR, HL7v2, DICOM, LOINC, SNOMED, or ICD-10
- Experience with Azure Databricks, Data Factory, Blob Storage, Delta Lake, Snowflake, or Fabric
- Familiarity with Git, CI/CD, Terraform, or infrastructure-as-code
- Experience with JSON, XML, Parquet, CSV, or other structured/semi-structured data formats