InStride Health is on a mission to deliver specialty anxiety and OCD care for children and families. They are seeking a Data Engineer II to design, build, and maintain scalable data infrastructure that supports clinical and operational decision-making.
Responsibilities:
- Design, develop, and maintain robust, scalable ETL/ELT data pipelines using Python, SQL, and data processing frameworks including dbt, Matillion, and AWS services
- Implement data quality checks, monitoring, and alerting across all data pipelines to ensure data integrity and reliability
- Optimize existing pipelines for performance, cost-efficiency, and error handling
- Contribute to the design and maintenance of InStride’s data warehouse and data lake solutions, including schema design, data modeling, and indexing strategies in Amazon Redshift
- Ensure data security, HIPAA compliance, and proper handling of protected health information (PHI) within all data infrastructure
- Work closely with data analysts, data scientists, and business intelligence engineers to understand their data requirements and deliver reliable, high-quality data access
- Troubleshoot and resolve complex production data issues with urgency and root-cause rigor
- Develop and maintain clear documentation for data models, pipelines, and data sources to enable self-service analytics
- Participate in code reviews and contribute to technical discussions, bringing a constructive and detail-oriented perspective
- Stay current on emerging data technologies and tools, bringing relevant insights to the team
Requirements:
- 3+ years of experience designing, developing, and deploying data pipelines and data warehouse solutions in production environments
- Strong proficiency in SQL and Python for data engineering and transformation work
- Hands-on experience with cloud data warehouses (Amazon Redshift preferred; Snowflake or BigQuery also valued) and familiarity with ETL/ELT tools such as dbt, Matillion, or similar
- Working knowledge of AWS services relevant to data engineering (e.g., S3, Glue, Lambda, Redshift)
- Demonstrated understanding of HIPAA compliance requirements and experience working with sensitive or regulated data
- Ability to design and build data systems that are scalable, observable, and built to handle growth in data volume and complexity
- Strong understanding of data integration patterns, including APIs, webhooks, and batch ingestion techniques
- Experience giving and receiving structured feedback through pull request reviews and technical discussions
- Strong communication skills, with the ability to translate complex technical concepts for both technical and non-technical stakeholders
- Comfortable operating in a fast-paced startup environment, balancing competing priorities and making sound tradeoffs
- Experience working with healthcare data is a plus