Implement and operate CI/CD pipelines to enable safe, repeatable deployments and rollbacks.
Provision and manage backend resources for inference (compute, containers, scaling), and tune performance, reliability, and cost in production.
Define and continuously monitor health and performance metrics for deployed services.
Triage issues by severity and drive timely resolution, including incident response and runbooks.
Own end-to-end REST API integration, connecting backend model services to product and platform surfaces through scalable, containerized services.
Work with researchers, evaluation engineers, product managers, and partner engineering teams to deliver production-ready solutions, communicate status and risks, and escalate when needed.
Requirements
BS or MS in Computer Science, Computer Engineering, or equivalent industry experience.
3+ years of professional software engineering experience building and operating production services.
Experience automating testing and deployments using CI/CD, including release workflows that support safe rollouts and rollbacks.
Experience building and operating cloud hosted, containerized services (for example Docker and Kubernetes or similar), including provisioning resources and scaling inference workloads.
Experience building REST APIs using Python based frameworks (or similar), and integrating backend services with product or platform consumers.
Strong software engineering fundamentals: version control, code quality, and writing maintainable, testable software.
Strong written communication skills to document architectures, runbooks, and operational processes.