Implement, test, and maintain Extract, Transform, Load/Extract, Load Transform (ETL/ELT) pipelines to ingest data from on-prem systems and cloud sources into data lakes/warehouses
Develop data transformations and data models to support reporting, analytics, and Machine Learning (ML) workloads
Build and maintain batch and streaming data workflows using orchestration tools (Airflow, Cloud Composer, Prefect, etc.) and data integration platforms (Informatica, Talend, Fivetran, etc.)
Work with on-prem technologies (databases, file shares, middleware) and cloud services (BigQuery, Redshift, Cloud Storage, S3, Dataflow, Glue) to move and transform data
Collaborate with data owners to profile data, identify quality issues, and implement data validation and cleansing rules
Design and implement data cataloging, lineage, and basic governance controls for datasets
Monitor pipeline health, troubleshoot failures, and implement alerts and automated retries
Contribute to Continuous Integration and Continuous Delivery (CI/CD) for data pipelines and infrastructure-as-code patterns (Terraform, Cloud Deployment Manager, etc.)
Produce clear technical documentation and runbooks for operational support and handoffs
Support production handover, incident response, and post-incident retrospectives
Requirements
1+ years of experience building data pipelines, ETL/ELT processes, and/or data engineering tasks
1+ years of experience with SQL
Experience designing queries and data models for analytics
Experience with scripting languages (Python or R programming) for data processing and automation
Experience with orchestration tools (Airflow, Prefect, Cloud Composer)
Experience with relational databases (Oracle, SQL Server, Postgres)
Experience with cloud data service (BigQuery, Redshift, Snowflake, and/or equivalent)
Experience with data quality, basic data governance, and logging/monitoring concepts