Thermo Fisher Scientific Inc. is a leading company in the scientific industry, and they are seeking a Scientist III, Data Engineer to develop scalable data pipelines and build new API integrations. The role involves owning and delivering projects associated with data platform solutions and implementing improvements to enhance data delivery and infrastructure scalability.

Responsibilities:

Develop scalable data pipelines and build out new API integrations to support continuing increases in data volume and complexity
Own and deliver Projects and Enhancements associated with Data platform solutions
Develop solutions using PySpark/EMR, SQL and databases, AWS Athena, S3, Redshift, AWS APIT Gateway, Lambda, Glue, and other Data Engineering technologies
Write Complex Queries and edit them as required for implementing ETL/Data solutions
Implement solutions using AWS and other cloud platform tools, including GitHub, Jenkins, Terraform, Jira, and Confluence
Follow agile development methodologies to deliver solutions and product features by following DevOps, Data Ops, and Dev Sec Ops practices
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, and re-designing infrastructure for greater scalability

Requirements:

Master's degree or foreign degree equivalent in Technology Management, Information Technology, Computer Science, or related field of study
3 years of experience as a Data Developer, Data Engineer, or related occupation
Bachelor's degree or foreign degree equivalent in Technology Management, Information Technology, Computer Science, or related field of study plus 5 years of experience as a Data Developer, Data Engineer, or related occupation
Full life cycle implementation in AWS using PySpark/EMR, Athena, S3, Redshift, AWS API Gateway, Lambda, and Glue
Agile development methodologies following DevOps, Data Ops, and Dev Sec Ops practices
ETL Pipelines, GitHub, Jenkins, Terraform, Jira, Bitbucket, and Confluence
Informatica, Databricks, & AWS Glue
Data Lake using AWS Databricks, Apache Spark, & Python
Data visualization tools like PowerBI and Tableau
Data modeling and optimization for OLAP/OLTP systems with Star/Snowflake schemas
Strong knowledge of SQL, query optimization, and performance tuning in Redshift, Snowflake, or Oracle
Experience with CI/CD pipelines for data workflows using Jenkins, GitHub Actions, or AWS CodePipeline
Data governance, cataloging, and lineage using tools such as AWS Glue Data Catalog, Collibra, or Alation
Implementing data security, encryption, IAM policies, and compliance regulatory frameworks
Batch and real-time streaming pipelines using Kafka and Spark Streaming
Managing data governance, access control, and lineage using Databricks Unity Catalog for secure enterprise data sharing
Implementing Delta Lake architecture for ACID transactions, schema enforcement, and scalable data pipelines
Optimizing Delta Live Tables for automated ETL orchestration and reliable data delivery
Ensuring high availability and SLA-driven production support with proactive monitoring, incident management, and root cause analysis
Collaboration with cross-functional teams to translate scientific, laboratory, and business requirements into scalable data solutions

Scientist III, Data Engineer

Key skills

About this role

Responsibilities:

Requirements: