Develop and maintain scalable ETL/ELT data pipelines, performing extraction, transformation, and loading of data from multiple sources, using Pentaho Data Integration as the primary platform.
Integrate new data sources into existing systems, ensuring data quality, consistency, and availability for analytics and business teams (for example, for Power BI dashboards).
Optimize and monitor the performance of data processes, performing tuning and adjustments as needed to ensure efficient, reliable, and low-latency pipelines.
Collaborate with the Automation Architect and BI/Analytics teams to understand data requirements for automation and reporting projects, and deliver solutions aligned to those needs.
Implement data governance, security, and compliance practices, ensuring pipelines adhere to data protection policies and that data quality is continuously verified (monitoring for failures, missing data, etc.).
Document developed data processes and flows, keeping clear records of transformations, pipeline configurations, and architecture to facilitate maintenance and future evolution.
Requirements
Previous experience in Data Engineering (commensurate with the seniority of the role), working on the construction of data pipelines and integration processes in production environments.
Strong experience with ETL and data integration tools, particularly with Pentaho Data Integration, to develop extraction, transformation, and load jobs.
Proficiency in programming, especially Python, for data manipulation and workflow automation.
Solid SQL skills and experience with relational databases (complex query design, query optimization, and data modeling).
Experience with cloud data environments (Cloud Computing), e.g., working with data platforms and services on AWS, Azure, or GCP, including cloud storage and data processing.
Knowledge of data architecture and Data Warehousing, including dimensional and conceptual modeling, understanding of ETL/ELT processes, and best practices for building reliable data pipelines.
Strong problem-solving skills and root-cause analysis for data pipelines, with the ability to debug job failures, handle exceptions in ETL processes, and ensure the integrity of delivered data.