ArteraAI is an AI startup focused on developing medical artificial intelligence tests to personalize therapy for cancer patients. The Software Engineer on the Compute Platform team will develop tools and services to enhance developer experience and automate infrastructure operations, collaborating closely with AI scientists and ML engineers.
Responsibilities:
- Design, build, and maintain compute infrastructure programmatically (AWS Kubernetes/EKS, AWS ECS, Lambda, and EC2) that powers Artera's AI products at scale
- Work closely with stakeholders to define and refine the platform's architecture, ensuring scalability, observability, reliability, and performance
- Build out core infrastructure, tooling, and software development processes
- Work closely with machine learning engineers to optimize training and inference workflows with efficiency and cost in mind
- Contribute to a range of platform engineering projects, from one-off solutions to long-term systems
- Contribute to cloud infrastructure security tools and services
Requirements:
- 2+ years managing containerized infrastructure at scale with a strong security mindset
- 3+ years building services and tools using Python with a software engineering mindset
- 2+ years building infrastructure automation solutions using Infrastructure-as-Code (e.g., Terraform, AWS CDK)
- Experience with AWS storage services: S3, EFS, FSx
- Experience with CI/CD pipelines
- Demonstrated integrity and consideration for appropriate data governance when working with sensitive, confidential data
- This is a remote role open to candidates who are currently authorized to work either in the United States or in Canada without the need for current or future employment-based visa sponsorship
- Experience with infrastructure observability and monitoring tools (e.g., Grafana, Prometheus, Datadog)
- Experience supporting ML/AI workloads — GPU instance management, training cluster optimization, batch inference pipelines
- Familiarity with cost optimization strategies for cloud compute at scale
- Experience with secrets management and cloud security tooling (e.g., AWS IAM, Vault, KMS)
- Contributions to internal developer tooling or platform libraries consumed by cross-functional teams