Thermo Fisher Scientific Inc. is a leading company in the scientific industry, and they are seeking a Scientist III, Data Engineer to develop scalable data pipelines and build new API integrations. The role involves owning and delivering projects associated with data platform solutions and implementing improvements to enhance data delivery and infrastructure scalability.
Responsibilities:
- Develop scalable data pipelines and build out new API integrations to support continuing increases in data volume and complexity
- Own and deliver Projects and Enhancements associated with Data platform solutions
- Develop solutions using PySpark/EMR, SQL and databases, AWS Athena, S3, Redshift, AWS APIT Gateway, Lambda, Glue, and other Data Engineering technologies
- Write Complex Queries and edit them as required for implementing ETL/Data solutions
- Implement solutions using AWS and other cloud platform tools, including GitHub, Jenkins, Terraform, Jira, and Confluence
- Follow agile development methodologies to deliver solutions and product features by following DevOps, Data Ops, and Dev Sec Ops practices
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, and re-designing infrastructure for greater scalability
Requirements:
- Master's degree or foreign degree equivalent in Technology Management, Information Technology, Computer Science, or related field of study
- 3 years of experience as a Data Developer, Data Engineer, or related occupation
- Bachelor's degree or foreign degree equivalent in Technology Management, Information Technology, Computer Science, or related field of study plus 5 years of experience as a Data Developer, Data Engineer, or related occupation
- Full life cycle implementation in AWS using PySpark/EMR, Athena, S3, Redshift, AWS API Gateway, Lambda, and Glue
- Agile development methodologies following DevOps, Data Ops, and Dev Sec Ops practices
- ETL Pipelines, GitHub, Jenkins, Terraform, Jira, Bitbucket, and Confluence
- Informatica, Databricks, & AWS Glue
- Data Lake using AWS Databricks, Apache Spark, & Python
- Data visualization tools like PowerBI and Tableau
- Data modeling and optimization for OLAP/OLTP systems with Star/Snowflake schemas
- Strong knowledge of SQL, query optimization, and performance tuning in Redshift, Snowflake, or Oracle
- Experience with CI/CD pipelines for data workflows using Jenkins, GitHub Actions, or AWS CodePipeline
- Data governance, cataloging, and lineage using tools such as AWS Glue Data Catalog, Collibra, or Alation
- Implementing data security, encryption, IAM policies, and compliance regulatory frameworks
- Batch and real-time streaming pipelines using Kafka and Spark Streaming
- Managing data governance, access control, and lineage using Databricks Unity Catalog for secure enterprise data sharing
- Implementing Delta Lake architecture for ACID transactions, schema enforcement, and scalable data pipelines
- Optimizing Delta Live Tables for automated ETL orchestration and reliable data delivery
- Ensuring high availability and SLA-driven production support with proactive monitoring, incident management, and root cause analysis
- Collaboration with cross-functional teams to translate scientific, laboratory, and business requirements into scalable data solutions