Design, develop, and evolve data pipelines using Databricks, ensuring high performance, reliability, and cost efficiency.
Maintain, structure, and expand the data architecture based on Azure Data Lake Gen2, ensuring governance, security, and end-to-end traceability.
Develop and optimize integrations with the company's AI stack, including vector databases and semantic search engines already in production.
Build robust pipelines that support both AI model training/consumption and analytical and Business Intelligence (BI) layers.
Serve as the team's technical reference, supporting junior and mid-level engineers in architectural decisions, code reviews, and dissemination of best practices.
Collaborate actively with product and AI teams on solution definition, participating in technical discussions and contributing to the evolution of the data architecture.
Requirements
Strong experience with Python as the primary development language.
Hands-on experience with Databricks, including job execution and orchestration, cluster optimization, and use of Delta Lake.
Advanced SQL skills, focused on data modeling, performance optimization, and complex transformations.
Experience with Azure Data Lake Gen2, including organizing layers (raw, trusted, refined), access control, and cost management in scalable environments.
Practical experience with vector databases and semantic search solutions in production environments.
Ability to act consultatively with product and AI teams, contributing to architectural decisions and not just task execution.