Zscaler accelerates digital transformation to ensure customers can be more agile, efficient, resilient, and secure. The Principal GenAI Data Engineer will drive the design and implementation of enterprise-grade Generative AI data ingestion and platform architectures, focusing on architecting robust pipelines for enterprise data for AI workloads.
Responsibilities:
- Architect enterprise-scale GenAI data platforms for ingestion, transformation, enrichment, and serving of structured and unstructured data
- Design scalable pipelines for enterprise knowledge ingestion from diverse data sources including documents, SaaS platforms, knowledge bases, collaboration tools, and databases
- Define architecture for metadata extraction, chunking, enrichment, embeddings generation, and knowledge preparation workflows
- Design AI-ready data models and storage strategies for vector, graph, and hybrid knowledge systems
- Architect scalable unstructured data processing pipelines for text, images, PDFs, tables, and multimodal content
Requirements:
- Expert-level Python programming and software engineering capabilities
- Experience building distributed/scalable data pipelines for AI workloads
- Strong understanding of unstructured data extraction and processing pipelines
- Experience with vector databases, graph databases, and metadata/knowledge storage systems
- Hands-on experience with clustering, entity recognition algorithms, and modern retrieval strategies (including RAG, search, and agentic AI workflows)
- Deep understanding of AI-ready data platform design principles and the ability to bridge platform/data engineering with GenAI/LLM application requirements
- Experience with LLMOps / GenAIOps frameworks such as LangSmith, Evaluation Framework like Arize Phoenix, Weights & Biases, or MLflow
- Familiarity with Agent Frameworks like LangGraph, CrewAI, or Google ADK