In your day-to-day we expect you to work on the following processes:
Support and Optimization of Pipelines (ETL/ELT): Lead the planning, construction and evolution of data pipelines on Databricks, ensuring robustness, performance and scalability.
Develop, validate and optimize SQL and PySpark scripts, serving as a technical reference for the team.
Define and apply cost and cluster consumption optimization strategies, recommending configurations and best practices.
Design tables and data structures oriented to high performance and ease of consumption (analytics, reports, APIs).
Data Quality Gates: Define and implement data quality gates and rules in Databricks pipelines and Delta Live Tables (DLT).
Establish and monitor data quality metrics, ensuring that only reliable data reaches the consumption layers.
Investigate and remediate data quality issues in collaboration with business areas and technical teams.
Modeling and Metadata: Perform logical and physical modeling in the Lakehouse.
Keep the data model repository up to date.
Maintain documentation and metadata catalogs.
Databricks Administration and Security: Support administration of the Databricks Workspace.
Manage access profiles.
Apply security and governance policies.
Data Governance: Contribute to governance practices and standardization of data assets.
Ensure data integrity, traceability and consistency.
Requirements
What we'll need for the perfect match:
Education: Bachelor's degree in Computer Science, Information Systems, Systems Analysis or related fields.
Experience: Proven 6 years of experience in the area.
Platform and Architecture: Experience with Databricks (Workspace, clusters, notebooks, Delta Lake, DLT).
Languages and Tools: Proficiency in SQL.
Proficiency in PL/SQL.
Experience with PySpark (Python + Spark) for building pipelines.
Experience with SQL and PySpark scripts focused on performance.
Modeling and Data: Experience in physical and logical data modeling.
Knowledge of dimensional modeling (Data Warehouse).
Experience with metadata and organizing model repositories.
Databases: Strong knowledge of relational and non-relational DBMSs such as PostgreSQL, SQL Server, MongoDB, DynamoDB.
Governance and Security: Knowledge of data governance, security rules, access and profile management, and information security policies.
Tech Stack
DynamoDB
ETL
MongoDB
Postgres
PySpark
Python
Spark
SQL
Benefits
🏡 Fully remote – Work from the safety and comfort of your home💙