PathAI is dedicated to improving patient outcomes with AI-powered pathology. They are seeking an experienced contract Back-End Developer to enhance the scalability, performance, and maintainability of their ML data infrastructure, collaborating closely with MLOps and ML engineering teams.
Responsibilities:
- Analyze and optimize storage strategies for ML experiment data and metadata
- Design and implement intelligent retention and expiration for large-scale datasets
- Modernize and refactor ETL/ELT pipelines to improve scalability and ease of maintenance
- Create and populate additional schemas for validated and curated datasets
- Build or enhance database-backed applications supporting ML R&D and production analytics
- Collaborate with ML engineers, SREs, and platform teams
- Provide knowledge transfer for long-term maintainers
Requirements:
- Proficiency in Python for application development, data processing and automation
- Expertise with relational databases (e.g., Postgres, Amazon RDS, Aurora), including schema design, query optimization, and performance tuning
- Expertise with ELT pipelines (dbt preferred) and cloud data warehousing (Snowflake preferred)
- Familiarity with big data deployments such as Spark and Hive
- Experience with Apache Airflow for systems automation
- Understanding of S3-based storage and large-scale data management strategies
- Ability to write clear technical documentation and collaborate effectively across teams
- Experience with query optimization, data partitioning strategies, and cost optimization in cloud environments
- Background in machine learning data pipelines or analytics-heavy environments
- Knowledge of data governance, retention policies, or cost-optimization strategies in cloud environments