Architect, implement, and optimize data platforms and pipelines specifically designed to support LLMs, Retrieval-Augmented Generation (RAG), and sophisticated AI agentic systems at Exabyte scale
Drive the adoption and deployment of agentic workflows and agent harnessing techniques to create autonomous, data-driven security features
Design and implement highly scalable, fault-tolerant, and cost-effective data solutions, emphasizing rapid iteration and high-quality deployment
Write elegant, production-ready code with a focus on performance, maintainability, and testing rigor, ensuring the ability to ship fast without compromising quality
Provide technical leadership and deep expertise in data modeling, normalization, and semantic cataloging for AI/ML workloads
Establish best practices for MLOps/DataOps surrounding LLMs, including monitoring, observability, and zero-touch recovery mechanisms for AI services
Actively mentor engineers, conducting technical workshops, leading design reviews, and strengthening the team's knowledge in cutting-edge AI platform technologies
Collaborate across the organization with Data Scientists, Product Managers, and other engineering teams to transform research prototypes into robust, production-grade services
Own the end-to-end lifecycle of critical data services: development, testing, deployment, and monitoring
Requirements
Master’s degree or PhD in Computer Science, Data Engineering, or a related STEM field, or equivalent practical experience
10+ years of progressive experience in Data Engineering/Platform Engineering, with at least 3 years focused on architecting and building platforms for AI/ML or Data Science at massive scale
Demonstrable hands-on experience in LLM engineering (fine-tuning, prompt engineering, deployment), RAG, and developing agentic workflows
Proven track record of designing and delivering large-scale distributed systems (sharding, partitioning, concurrency)
Exceptional ability to write clean, elegant, performant, and well-tested code, coupled with a proactive mindset for delivering results quickly
A thorough understanding of engineering practices, including effective peer code reviews, resilient architecture design, and comprehensive testing paradigms
Prior experience in a Principal or Staff level engineering role, demonstrating technical leadership and mentorship capabilities
Tech Stack
Distributed Systems
Benefits
Market leader in compensation and equity awards
Comprehensive physical and mental wellness programs
Competitive vacation and holidays for recharge
Paid parental and adoption leaves
Professional development opportunities for all employees regardless of level or role
Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections