Reddit is a community of communities, home to the most open and authentic conversations on the internet. They are seeking a Senior Software Engineer to lead the development of a large-scale GenAI Platform, focusing on the design, implementation, and maintenance of AI systems and infrastructure.
Responsibilities:
- Lead the development of a large-scale GenAI Platform at Reddit
- Contribute to the design, implementation, and maintenance of the LLM Gateway, focusing on features like unified API endpoints for internal/externally hosted LLM, rate/token limit management, and intelligent failover mechanisms to boost uptime and reliability
- Designed and developed ML and Generative AI systems in cloud-based production environments at scale
- Build and manage enterprise-grade RAG applications using embeddings, vector search, and retrieval pipelines
- Implement and operationalize agentic AI workflows with tool use using frameworks such as LangChain and LangGraph
- Drive adoption of MLOps / LLMOps practices, including CI/CD automation, versioning, testing, and lifecycle management
- Establish best practices for observability, monitoring, evaluation, and governance of GenAI pipelines in production
- Strong ownership mindset and platform thinking
- Ability to lead AI platform delivery from concept to production
Requirements:
- 5+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles
- Experience operating orchestration systems such as Kubernetes at scale
- Deep experience with cloud-based technologies for supporting an ML platform, including tools like AWS, Google Cloud Storage, infrastructure-as-code (Terraform), and more
- Proficiency with the common programming languages and frameworks of ML, such as Go, Python, etc
- Excellent communication skills with the ability to articulate technical AI concepts to non-technical stakeholders
- Strong focus on scalability, reliability, performance, and ease of use
- Strong ownership mindset and platform thinking
- Ability to lead AI platform delivery from concept to production
- Strong knowledge of model serving, inference pipelines, monitoring, and observability for AI systems
- Strong proficiency in Python and experience with modern AI/ML frameworks (e.g. LangChain, Vertex AI Agent Builder, TensorFlow, PyTorch)