Abacus Insights is transforming how data works for health plans. They are seeking a Data Operations Engineer Intern to support production grade data pipelines, monitor pipeline health, debug issues, and automate workflows.

Responsibilities:

Monitor production data pipelines and systems, identifying failures, latency issues, schema changes, and data quality anomalies
Debug pipeline failures by analyzing logs, metrics, SQL outputs, and upstream/downstream dependencies
Assist in root cause analysis (RCA) for data incidents and contribute to implementing corrective and preventive solutions
Support the maintenance and optimization of ETL/ELT workflows to improve reliability, scalability, and performance
Automate recurring data operations tasks using Python, shell scripting, or similar tools to reduce manual intervention
Assist with data mapping, transformation, and normalization efforts, including alignment with Master Data Management (MDM) systems
Collaborate on the generation and validation of synthetic test datasets for pipeline testing and data quality validation
Shadow senior engineers to deploy, monitor, and troubleshoot data workflows on AWS, Databricks, and Kubernetes-based environments
Ensure data integrity and consistency across multiple environments (development, staging, production)
Clearly document bugs, data issues, and operational incidents in Jira and Confluence, including reproduction steps, impact analysis, and resolution details
Communicate effectively with cross-functional, onsite, and offshore teams to escalate issues, provide status updates, and track resolutions
Participate in Agile ceremonies and follow structured incident and change management processes

Requirements:

Strong interest in data engineering, data operations, and production data systems
Currently pursuing or recently completed a Master's degree in Computer Science, Data Science, Engineering, Statistics, or a related quantitative discipline
Solid understanding of ETL/ELT architectures, including ingestion, transformation, validation, orchestration, and error handling
Proficiency in SQL, including complex joins, aggregations, window functions, and debugging data discrepancies at scale
Working knowledge of Python for data processing, automation, and operational tooling
Familiarity with workflow orchestration tools such as Apache Airflow, including DAG design, scheduling, retries, and dependency management
Experience or exposure to data integration platforms such as Airbyte, including connector-based ingestion, schema evolution, and sync monitoring
Understanding of Master Data Management (MDM) concepts and tools, with exposure to platforms such as Rhapsody, Onyx, or other enterprise MDM solutions
Knowledge of data pipeline observability, including log analysis, metrics, alerting, and debugging failed jobs
Exposure to cloud platforms (preferably AWS), with familiarity in services such as S3, Lambda, EMR, EKS, or managed data processing services
Ability to communicate technical issues clearly and concisely, including writing actionable bug reports and collaborating on incident resolution
Strong documentation habits and attention to detail in operational workflows
Experience with cloud data warehouses such as Snowflake or BigQuery
Familiarity with Databricks, Apache Spark, or distributed data processing frameworks
Hands-on experience building automation for data operations or reliability engineering
Exposure to healthcare data standards, regulated data environments, or HIPAA-compliant systems

Data Operations Engineer Intern (Internship)

Key skills

About this role

Responsibilities:

Requirements: