H1 is a company dedicated to improving healthcare access through innovative data solutions. They are seeking a Staff Data Engineer to lead high-visibility projects and ensure the delivery of scalable and reliable data architectures for critical data assets.
Responsibilities:
- Act as a self-starter who drives execution independently, taking ownership and initiative with minimal need for day-to-day direction
- Lead high-visibility RWE projects, starting with claims data, and keep multiple initiatives moving by proactively unblocking teams
- Own the end-to-end architecture for critical data assets, ensuring solutions are scalable, reliable, and aligned with H1’s long-term vision
- Design, build, and optimize large-scale data pipelines (hundreds of TBs) for performance, reliability, and cost efficiency
- Partner with Product, Data Science, and downstream engineering teams to align priorities, manage dependencies, and deliver high-value outcomes
- Represent engineering in cross-functional forums, shaping roadmaps and reducing reliance on senior leadership for day-to-day decisions
- Develop deep domain expertise and mentor other engineers, helping raise the technical bar and influence the evolution of our data products
Requirements:
- 8+ years as a software, data, or backend engineer building and operating scalable, production-grade systems
- Experience with large-scale data processing (e.g., Spark/PySpark on EMR or similar) or scalable distributed backend systems, with the ability to quickly deepen expertise in our data stack (PySpark, EMR, Hudi/Delta)
- Strong proficiency in SQL, including writing and optimizing complex queries over large datasets
- Strong programming experience in Python (or a modern language with the ability to quickly ramp up in Python)
- Experience designing systems or large-scale datasets/pipelines with attention to performance, reliability, and maintainability
- Hands-on experience with modern engineering workflows and tooling such as Git, JIRA, and CI/CD systems (e.g., CircleCI)
- Comfort deploying and troubleshooting distributed workloads in cloud environments such as AWS EMR or Kubernetes
- Experience with workflow orchestration or job scheduling tools (e.g., Airflow, Argo)
- Demonstrated ability to independently drive complex, cross-team technical initiatives and influence stakeholders without formal authority
- Experience with streaming/messaging technologies (e.g., Kafka, Kinesis) nice to have
- Background in RWE, healthcare data, or other complex/regulated data domains is preferred
- Experience using AI-assisted coding tools (e.g., GitHub Copilot, Claude Code) to accelerate development while maintaining quality is encouraged