Cellebrite is a leading company focused on enhancing digital investigations and intelligence gathering to protect and save lives. They are seeking a Senior DevOps / Cloud Engineer to manage application services, cloud infrastructure, and deployment pipelines while ensuring production reliability in a rapidly evolving GenAI environment.
Responsibilities:
- Own and manage application services running on GCP infrastructure, including serverless and managed services
- Design and maintain robust CI/CD pipelines for rapid, safe deployments
- Operate and optimize GenAI/LLM workloads in production, including RAG pipelines and agentic workflows
- Monitor and improve latency, cost, and reliability of AI-driven systems
- Troubleshoot complex production issues across application, data, and infrastructure layers
- Work with and optimize BigQuery-based data workflows, queries, and performance
- Support and debug multi-step AI pipelines and agent orchestration flows
- Implement and maintain observability (logging, metrics, tracing, alerting), including for AI pipelines
- Collaborate with engineering teams on architecture improvements for evolving GenAI systems
- Partner with Customer Success to investigate and resolve customer-impacting issues (minimal direct customer interaction)
- Enforce security and best practices in a sensitive data environment
Requirements:
- 5+ years of experience in DevOps / SRE / Cloud Engineering
- Strong hands-on experience with Google Cloud Platform (GCP)
- Proven experience with serverless architectures (Cloud Run, Cloud Functions, or similar)
- Experience working with BigQuery (querying, performance tuning, troubleshooting)
- Experience running and supporting production SaaS applications
- Hands-on experience with GenAI / LLM-based applications in production (including RAG systems, model APIs, or similar)
- Experience supporting or operating multi-step AI pipelines or agentic workflows
- Strong experience with CI/CD pipelines (GitHub Actions, etc.)
- Solid scripting/programming skills (Python, TypeScript, Bash, or similar)
- Experience with observability and monitoring tools
- Experience optimizing LLM performance, cost, and reliability at scale
- Familiarity with vector databases, embeddings, and retrieval systems
- Experience with infrastructure as code (Terraform or similar)
- Background in secure or regulated environments
- Experience in fast-scaling or experimental product environments