Monitor and maintain the health and efficiency of data pipelines
Troubleshoot and perform root cause analysis for data discrepancies and pipeline issues
Communicate with data providers to understand data discrepancies and manage changes in data delivery
Implement fixes and enhancements to improve data quality and pipeline performance
Collaborate with data scientists and analysts to understand data needs and implement effective data solutions
Develop strategies for data validation and quality assurance
Optimize data flow and collection to improve system efficiency
Document and manage data pipeline architectures, including maintenance and update protocols
Use tools such as SQL, version control and CI/CD, containerization, task schedulers, python frameworks, and cloud services for data pipeline management
Ensure compliance with data governance and security standards
Requirements
Bachelor’s Degree required
Minimum one year of research computing experience required
Basic Linux use and administration: system layout, file permissions, shell, utilities (syslog, cron), diagnostic tools (ps, htop, grep, lsof)
Experience in Apache Airflow, preferably version 3.0
Basic database use, especially in Postgres
Rough script programming (Python, bash)
Team software development (git/GitHub, Jira, code reviews, agile methodologies)
Data analysis: diagnosing and fixing runtime errors and logic bugs; performing basic growth projections to predict future problems; communicating results
Tech Stack
Airflow
Apache
Cloud
Linux
Postgres
Python
SQL
Benefits
comprehensive medical, prescription, dental, and vision insurance
generous retirement savings program with employer contributions
tuition benefits
ample paid time off and observed holidays
life and accidental death and disability insurance
free Pittsburgh Regional Transit bus pass
access to our Family Concierge Team to help navigate childcare needs