Hagerty is the world’s largest insurer of collectible and enthusiast vehicles, dedicated to making driving enjoyable for enthusiasts. As a Data Engineer II, you will develop and maintain data pipelines and cloud-based infrastructure to support Hagerty’s Enterprise Data Hub, collaborating with engineers to drive data-driven decision making.

Responsibilities:

Implement best practices around software development and big data engineering
Develop and implement robust and scalable data pipelines using Python, SQL, parallel processing frameworks, and other AWS/Salesforce cloud solutions
Develop and implement batch data pipelines using tools such as Apache Airflow, Snowflake, and numerous AWS products (EC2, Fargate, ECS, Lambda, and RDS)
Develop streaming data integrations to support products across the Hagerty portfolio and support real-time reporting
Develop Enterprise Data Hub platform infrastructure using Terraform infrastructure-as-code
Develop and support Hagerty’s cloud-based data warehouse to enable analytics and product reporting
Partner with internal and external stakeholders to collect requirements, recommend best practice solutions, and productionize new data ingestions/analytic workloads
Develop solutions to catalog and manage metadata to support data governance and data democratization
Partner with Data Quality Engineers to define and implement automated test cases and data reconciliation to validate ETL processes and data quality & integrity
Mentor junior team members in software and big data engineering best practices
Partner with Data Scientists to design, code, train, test, deploy and iterate machine learning algorithms and systems at scale

Requirements:

Implement best practices around software development and big data engineering
Develop and implement robust and scalable data pipelines using Python, SQL, parallel processing frameworks, and other AWS/Salesforce cloud solutions
Develop and implement batch data pipelines using tools such as Apache Airflow, Snowflake, and numerous AWS products (EC2, Fargate, ECS, Lambda, and RDS)
Develop streaming data integrations to support products across the Hagerty portfolio and support real-time reporting
Develop Enterprise Data Hub platform infrastructure using Terraform infrastructure-as-code
Develop and support Hagerty's cloud-based data warehouse to enable analytics and product reporting
Partner with internal and external stakeholders to collect requirements, recommend best practice solutions, and productionize new data ingestions/analytic workloads
Develop solutions to catalog and manage metadata to support data governance and data democratization
Partner with Data Quality Engineers to define and implement automated test cases and data reconciliation to validate ETL processes and data quality & integrity
Mentor junior team members in software and big data engineering best practices
Partner with Data Scientists to design, code, train, test, deploy and iterate machine learning algorithms and systems at scale
You have strong problem-solving abilities and attention to detail
You can authentically and effectively communicate (written and verbally) with various stakeholders
You create and share technical artifacts and documentation to support development and maintenance of data products
You have experience in successful delivery of data products as productionizable software solutions
You ensure quality through rigorous code development, testing, automation, and other software engineering best practices
You have experience developing solutions using Python and cloud-based infrastructure (AWS, Azure, or GCP)
You demonstrate experience in imperative (e.g., Apache Airflow / NiFi) or declarative (e.g., Informatica/Talend/Pentaho) ETL design, implementation, and maintenance
You have functional knowledge of relational databases and query authoring (SQL)
Associates degree, preferably in a technical/analytical field, or relevant work experience
Additional 3+ years working in another role within an IT delivery team, such as a developer, engineer, data analyst, quality assurance analyst, ETL developer or DBA
Developing infrastructure as code in a cloud-based environment (Terraform experience preferred)
Experience cataloging and processing non-relational data
Experience with open-source data processing technologies such as streaming services (Kafka / SQS), big data processing frameworks (MapReduce/Spark), big data file stores (EMRFS / HDFS)
Experience working with evaluating different data containers based on workload needs (JSON, delimited files, Avro, Parquet)
Experience with container-based development

Data Engineer II

Key skills

About this role

Responsibilities:

Requirements: