Design, build, and maintain scalable ML, AI, and data-platform services that power production features end-to-end.
Operate LLM applications in production: chat with memory, retrieval, prompt management, versioning, and experimentation.
Build and run structured evaluation pipelines (golden datasets, regression checks) so changes ship with confidence.
Own AI/ML observability and monitoring instrument, trace, and debug model behavior in prod with tools like Langfuse or MLflow (knowing one is enough).
Integrate and serve LLM-agnostic model backends (e.g., Anthropic models via Bedrock and Vertex) and support deterministic/custom ML where open-source models fall short (e.g., document/PDF parsing).
Contribute to data lineage, governance, and compliance as the platform matures.
Partner closely with product and the broader team to move fast without breaking quality.
Requirements
Strong software engineering background you've built and run scalable, cloud-native SaaS in production (not just notebooks).
Hands-on experience running LLMs / AI systems in production: prompting, evaluation, observability, monitoring.
AI-native workflow fluency you build with AI coding tools every day and know how to drive them, not take orders from them.
Solid Python and comfort across the stack (backend-leaning is fine).
Experience with data pipelines and ML/AI infrastructure on GCP or AWS.
Strong written and spoken English; comfortable operating on US hours and embedded in a client team.