H1 is a company dedicated to improving healthcare access and equity through data and AI technology. They are seeking a Senior Backend Software Engineer to design and scale systems for their data platform, collaborating with cross-functional teams to deliver high-performance data solutions.
Responsibilities:
- Work on developing strategies and frameworks to capture web data at scale
- Design, develop, and maintain scalable data extraction frameworks that ingest structured and unstructured data from diverse sources
- Build and optimize robust ETL/ELT pipelines using big data technologies, especially Apache Spark on cloud platforms (preferably AWS EMR)
- Improve the efficiency, reliability, and performance of data processing systems through thoughtful design and continuous optimization
- Transform, clean, and normalize complex datasets for downstream use, ensuring high standards of data quality and consistency
- Partner with senior engineers to evolve H1’s data architecture and infrastructure in support of product and platform scalability
- Lead data integration efforts across multiple systems, ensuring accuracy and seamless collaboration across teams
- Monitor and troubleshoot data flows and pipelines, proactively identifying and resolving performance issues
- Maintain clear documentation of systems, workflows, and processes to promote transparency and operational excellence
- Participate in code reviews and promote a culture of engineering excellence, mentorship, and continuous improvement
- Collaborate closely with cross-functional teams to align technical execution with business goals
Requirements:
- 5+ years professional experience in data engineering or software engineering, working with large-scale data systems and pipelines
- Strong proficiency in Python
- Proficiency in web scraping strategies and technologies: curl, network analysis, proxies and selenium/playwright
- Strong SQL skills and experience with PostgreSQL
- Experience with big data tools like Apache Spark, particularly on cloud platforms, with a preference for AWS EMR
- Experience with Docker or other containerization technologies
- Understanding of Large Language Models (LLMs) and their applications
- Familiarity with model training and fine-tuning, particularly in NLP (Natural Language Processing) contexts
- Basic knowledge of network, security, and encryption protocols such as HTTP/HTTPS/TLS
- Strong analytical and problem-solving skills with a focus on data quality and performance optimization
- Passion for writing clean, efficient code and following best practices