Monitor and maintain the health and efficiency of data pipelines
Troubleshoot and perform root cause analysis for data discrepancies and pipeline issues
Communicate with data providers to understand data discrepancies and manage changes in data delivery
Implement fixes and enhancements to improve data quality and pipeline performance
Collaborate with data scientists and analysts to understand data needs and implement effective data solutions
Develop strategies for data validation and quality assurance
Optimize data flow and collection to improve system efficiency
Document and manage data pipeline architectures, including maintenance and update protocols
Use tools such as SQL, version control and CI/CD, containerization, task schedulers, python frameworks, and cloud services for data pipeline management
Ensure compliance with data governance and security standards

Bachelor’s Degree required
Minimum one year of research computing experience required
Basic Linux use and administration: system layout, file permissions, shell, utilities (syslog, cron), diagnostic tools (ps, htop, grep, lsof)
Experience in Apache Airflow, preferably version 3.0
Basic database use, especially in Postgres
Rough script programming (Python, bash)
Team software development (git/GitHub, Jira, code reviews, agile methodologies)
Data analysis: diagnosing and fixing runtime errors and logic bugs; performing basic growth projections to predict future problems; communicating results

Data Pipeline Engineer – Computer Science

Key skills