Build and maintain scalable data pipelines using Spark, Databricks, and cloud platforms
Design data models for analytics, ML, and AI applications
Drive adoption of AI tools and agentic workflows within the data engineering team
Identify and implement ways to improve engineering efficiency using AI
Prototype and scale AI-assisted development practices
Act as a go-to expert for AI experimentation and knowledge sharing
Help establish best practices and contribute to an AI-focused community or guild
Build pipelines supporting ML models, LLM applications, and AI workflows
Ensure data quality, observability, and reliability
Collaborate with Product, Data Science, ML/AI, and DevOps teams

3+ years of commercial experience in data engineering
Strong proficiency in SQL and Python (development and optimization)
Hands-on experience with Spark/PySpark (Databricks is a plus)
Experience with cloud data platforms (Azure preferred: ADF, Synapse, ADLS, Event Hub)
Solid understanding of ETL/ELT, data modeling, and data warehousing
Experience with orchestration tools (Airflow, ADF)
Understanding of reliability, performance, and production-grade systems
Hands-on experience using AI coding tools (Copilot, Cursor, Claude Code, etc.) in real workflows
Experience delivering at least one project with AI-assisted development
Ability to structure tasks for AI tools and critically validate their output
Upper Intermediate level of English for effective communication
WILL BE A PLUS
Experience configuring AI development environments (agents, integrations, workflows)
Familiarity with LLMs, embeddings, and RAG architectures
Experience with vector databases (pgvector, FAISS, etc.)
Familiarity with AI/agent frameworks (LangChain, LlamaIndex, etc.)
Experience with dbt, Kafka, BI tools
Data quality tooling (Great Expectations, Soda, etc.)
Multi-cloud experience (AWS/GCP)
Interest in advanced topics (evaluation, reranking, drift detection, synthetic data)
Contributions to AI/data tooling or open source

AI Data Engineer

Key skills