Dice is seeking a Lead Data Engineer to transform a complex data ecosystem into a scalable, unified platform that powers analytics and AI. This hands-on leadership role involves building production-grade data systems and enabling a modern data platform that supports advanced analytics and improved user experiences.
Responsibilities:
- Design, build, and own enterprise-grade data pipelines on Microsoft Azure / Fabric
- Integrate data across multiple enterprise platforms (APIs, files, and streaming sources)
- Develop scalable ingestion patterns (REST APIs, SFTP, batch & near real-time pipelines)
- Build curated, analytics-ready datasets (dimensional modeling / Data Vault)
- Ensure data quality, monitoring, SLAs, and alerting
- Implement secure and governed data movement (lineage, access control, compliance)
- Enable data for:
- AI / ML initiatives
- RAG (Retrieval-Augmented Generation) pipelines
- Vectorized and unstructured data processing
- Collaborate with cross-functional teams across Data, Analytics, and Business units
- Provide technical leadership and mentorship (no direct reports initially)
Requirements:
- 7+ years of hands-on data engineering experience
- Strong experience with Microsoft Azure (required)
- Proven track record building production data pipelines across complex systems
- Strong SQL and Python skills (PySpark/Spark preferred)
- Experience with: Data ingestion (API, SFTP, batch, streaming)
- Data modeling and transformation
- Pipeline reliability, monitoring, and performance optimization
- Hands-on experience with Microsoft Fabric (end-to-end pipelines)
- Experience supporting AI/ML or advanced analytics use cases
- Experience with RAG pipelines and LLM-based systems
- Knowledge of: Prompt engineering
- Semantic search / vector databases
- Retrieval pipelines and chunking strategies
- Familiarity with frameworks like: LangGraph, CrewAI, or similar agent frameworks
- Experience building production AI/data workflows (not just POCs)