Hagerty is the world’s largest insurer of collectible and enthusiast vehicles, dedicated to making driving enjoyable for enthusiasts. As a Data Engineer II, you will develop and maintain data pipelines and cloud-based infrastructure to support Hagerty’s Enterprise Data Hub, collaborating with engineers to drive data-driven decision making.
Responsibilities:
- Implement best practices around software development and big data engineering
- Develop and implement robust and scalable data pipelines using Python, SQL, parallel processing frameworks, and other AWS/Salesforce cloud solutions
- Develop and implement batch data pipelines using tools such as Apache Airflow, Snowflake, and numerous AWS products (EC2, Fargate, ECS, Lambda, and RDS)
- Develop streaming data integrations to support products across the Hagerty portfolio and support real-time reporting
- Develop Enterprise Data Hub platform infrastructure using Terraform infrastructure-as-code
- Develop and support Hagerty’s cloud-based data warehouse to enable analytics and product reporting
- Partner with internal and external stakeholders to collect requirements, recommend best practice solutions, and productionize new data ingestions/analytic workloads
- Develop solutions to catalog and manage metadata to support data governance and data democratization
- Partner with Data Quality Engineers to define and implement automated test cases and data reconciliation to validate ETL processes and data quality & integrity
- Mentor junior team members in software and big data engineering best practices
- Partner with Data Scientists to design, code, train, test, deploy and iterate machine learning algorithms and systems at scale
Requirements:
- Implement best practices around software development and big data engineering
- Develop and implement robust and scalable data pipelines using Python, SQL, parallel processing frameworks, and other AWS/Salesforce cloud solutions
- Develop and implement batch data pipelines using tools such as Apache Airflow, Snowflake, and numerous AWS products (EC2, Fargate, ECS, Lambda, and RDS)
- Develop streaming data integrations to support products across the Hagerty portfolio and support real-time reporting
- Develop Enterprise Data Hub platform infrastructure using Terraform infrastructure-as-code
- Develop and support Hagerty's cloud-based data warehouse to enable analytics and product reporting
- Partner with internal and external stakeholders to collect requirements, recommend best practice solutions, and productionize new data ingestions/analytic workloads
- Develop solutions to catalog and manage metadata to support data governance and data democratization
- Partner with Data Quality Engineers to define and implement automated test cases and data reconciliation to validate ETL processes and data quality & integrity
- Mentor junior team members in software and big data engineering best practices
- Partner with Data Scientists to design, code, train, test, deploy and iterate machine learning algorithms and systems at scale
- You have strong problem-solving abilities and attention to detail
- You can authentically and effectively communicate (written and verbally) with various stakeholders
- You create and share technical artifacts and documentation to support development and maintenance of data products
- You have experience in successful delivery of data products as productionizable software solutions
- You ensure quality through rigorous code development, testing, automation, and other software engineering best practices
- You have experience developing solutions using Python and cloud-based infrastructure (AWS, Azure, or GCP)
- You demonstrate experience in imperative (e.g., Apache Airflow / NiFi) or declarative (e.g., Informatica/Talend/Pentaho) ETL design, implementation, and maintenance
- You have functional knowledge of relational databases and query authoring (SQL)
- Associates degree, preferably in a technical/analytical field, or relevant work experience
- Additional 3+ years working in another role within an IT delivery team, such as a developer, engineer, data analyst, quality assurance analyst, ETL developer or DBA
- Developing infrastructure as code in a cloud-based environment (Terraform experience preferred)
- Experience cataloging and processing non-relational data
- Experience with open-source data processing technologies such as streaming services (Kafka / SQS), big data processing frameworks (MapReduce/Spark), big data file stores (EMRFS / HDFS)
- Experience working with evaluating different data containers based on workload needs (JSON, delimited files, Avro, Parquet)
- Experience with container-based development