Optura.AI is healthcare’s AI orchestration platform, aiming to revolutionize how healthcare deploys and operationalizes AI in production. They are seeking a Senior Platform Engineer to design, build, and operate core services for their AI Platform, ensuring secure and scalable services in real-world healthcare environments.
Responsibilities:
- Build core platform services in Python and TypeScript for orchestration, routing, model gateways, retrieval-augmented generation (RAG), and evaluation pipelines
- Leverage AI-assisted development tools (e.g., Claude, Cursor) alongside tests, linters, and benchmarks to improve velocity and quality
- Own services from design through deployment, including SLO creation, dashboards, runbooks, and operational readiness
- Improve reliability by optimizing system latency, availability, performance, and cost; lead and participate in incident response and postmortems
- Develop production AI capabilities including guardrails, prompt and version management, offline and online evaluations, and multi-provider integrations
- Build and maintain data and storage systems including vector search (pgvector, Pinecone, OpenSearch), caching, and Postgres/RDS patterns
- Implement security and compliance best practices aligned to HIPAA, including RBAC, audit logging, least-privilege access, and secrets management
Requirements:
- 5+ years of software engineering experience with strong proficiency in Python and TypeScript
- 2+ years of experience operating AI systems in production (agentic workflows, RAG, orchestration, or similar)
- Experience with operating in Cloud environments, including the use of containers/Kubernetes (EKS or ECS) and Terraform
- Experience designing and operating distributed systems with a focus on performance optimization and deep debugging
- Experience with observability systems (metrics, tracing, logging) and on-call ownership
- Build core platform services in Python and TypeScript for orchestration, routing, model gateways, retrieval-augmented generation (RAG), and evaluation pipelines
- Leverage AI-assisted development tools (e.g., Claude, Cursor) alongside tests, linters, and benchmarks to improve velocity and quality
- Own services from design through deployment, including SLO creation, dashboards, runbooks, and operational readiness
- Improve reliability by optimizing system latency, availability, performance, and cost; lead and participate in incident response and postmortems
- Develop production AI capabilities including guardrails, prompt and version management, offline and online evaluations, and multi-provider integrations
- Build and maintain data and storage systems including vector search (pgvector, Pinecone, OpenSearch), caching, and Postgres/RDS patterns
- Implement security and compliance best practices aligned to HIPAA, including RBAC, audit logging, least-privilege access, and secrets management
- Experience working in healthcare or other regulated industries, including HIPAA or PHI-handling practices
- Experience with LLMOps, including prompt management, evaluation frameworks, guardrails, and cost and latency tuning
- Experience building or operating model gateways, traffic shaping, multi-provider routing, and caching at scale