Abacus Insights is transforming how data works for health plans. They are seeking a Data Operations Engineer Intern to support production grade data pipelines, monitor pipeline health, debug issues, and automate workflows.
Responsibilities:
- Monitor production data pipelines and systems, identifying failures, latency issues, schema changes, and data quality anomalies
- Debug pipeline failures by analyzing logs, metrics, SQL outputs, and upstream/downstream dependencies
- Assist in root cause analysis (RCA) for data incidents and contribute to implementing corrective and preventive solutions
- Support the maintenance and optimization of ETL/ELT workflows to improve reliability, scalability, and performance
- Automate recurring data operations tasks using Python, shell scripting, or similar tools to reduce manual intervention
- Assist with data mapping, transformation, and normalization efforts, including alignment with Master Data Management (MDM) systems
- Collaborate on the generation and validation of synthetic test datasets for pipeline testing and data quality validation
- Shadow senior engineers to deploy, monitor, and troubleshoot data workflows on AWS, Databricks, and Kubernetes-based environments
- Ensure data integrity and consistency across multiple environments (development, staging, production)
- Clearly document bugs, data issues, and operational incidents in Jira and Confluence, including reproduction steps, impact analysis, and resolution details
- Communicate effectively with cross-functional, onsite, and offshore teams to escalate issues, provide status updates, and track resolutions
- Participate in Agile ceremonies and follow structured incident and change management processes
Requirements:
- Strong interest in data engineering, data operations, and production data systems
- Currently pursuing or recently completed a Master's degree in Computer Science, Data Science, Engineering, Statistics, or a related quantitative discipline
- Solid understanding of ETL/ELT architectures, including ingestion, transformation, validation, orchestration, and error handling
- Proficiency in SQL, including complex joins, aggregations, window functions, and debugging data discrepancies at scale
- Working knowledge of Python for data processing, automation, and operational tooling
- Familiarity with workflow orchestration tools such as Apache Airflow, including DAG design, scheduling, retries, and dependency management
- Experience or exposure to data integration platforms such as Airbyte, including connector-based ingestion, schema evolution, and sync monitoring
- Understanding of Master Data Management (MDM) concepts and tools, with exposure to platforms such as Rhapsody, Onyx, or other enterprise MDM solutions
- Knowledge of data pipeline observability, including log analysis, metrics, alerting, and debugging failed jobs
- Exposure to cloud platforms (preferably AWS), with familiarity in services such as S3, Lambda, EMR, EKS, or managed data processing services
- Ability to communicate technical issues clearly and concisely, including writing actionable bug reports and collaborating on incident resolution
- Strong documentation habits and attention to detail in operational workflows
- Experience with cloud data warehouses such as Snowflake or BigQuery
- Familiarity with Databricks, Apache Spark, or distributed data processing frameworks
- Hands-on experience building automation for data operations or reliability engineering
- Exposure to healthcare data standards, regulated data environments, or HIPAA-compliant systems