Zscaler accelerates digital transformation to ensure customers can be more agile, efficient, resilient, and secure. The Principal GenAI Data Engineer will drive the design and implementation of enterprise-grade Generative AI data ingestion and platform architectures, focusing on architecting robust pipelines for enterprise data for AI workloads.

Responsibilities:

Architect enterprise-scale GenAI data platforms for ingestion, transformation, enrichment, and serving of structured and unstructured data
Design scalable pipelines for enterprise knowledge ingestion from diverse data sources including documents, SaaS platforms, knowledge bases, collaboration tools, and databases
Define architecture for metadata extraction, chunking, enrichment, embeddings generation, and knowledge preparation workflows
Design AI-ready data models and storage strategies for vector, graph, and hybrid knowledge systems
Architect scalable unstructured data processing pipelines for text, images, PDFs, tables, and multimodal content

Requirements:

Expert-level Python programming and software engineering capabilities
Experience building distributed/scalable data pipelines for AI workloads
Strong understanding of unstructured data extraction and processing pipelines
Experience with vector databases, graph databases, and metadata/knowledge storage systems
Hands-on experience with clustering, entity recognition algorithms, and modern retrieval strategies (including RAG, search, and agentic AI workflows)
Deep understanding of AI-ready data platform design principles and the ability to bridge platform/data engineering with GenAI/LLM application requirements
Experience with LLMOps / GenAIOps frameworks such as LangSmith, Evaluation Framework like Arize Phoenix, Weights & Biases, or MLflow
Familiarity with Agent Frameworks like LangGraph, CrewAI, or Google ADK

Principal GenAI Data Engineer

Key skills

About this role

Responsibilities:

Requirements: