McKesson's Sarah Cannon Research Institute (SCRI) is seeking a Senior Data Engineer to support strategic data needs in advancing oncology treatments. The role involves designing, building, and maintaining data engineering processes while collaborating with cross-functional teams to develop efficient data pipelines and support data modernization initiatives.
Responsibilities:
- Design and implement scalable and efficient data pipelines to support various data-driven initiatives
- Designs and maintains Databricks Lakehouse pipelines across Bronze/Silver/Gold (Delta) layers, producing governed, ML-ready datasets with built-in data quality checks and lineage
- Collaborate with cross-functional teams to understand data requirements and contribute to the development of data architectures
- Work on data integration projects, ensuring seamless and optimized data flow between systems
- Implement best practices for data engineering, ensuring data quality, reliability, and performance
- Contribute to data modernization efforts by leveraging cloud solutions and optimizing data processing workflows
- Demonstrate technical leadership by staying abreast of emerging data engineering technologies and implementing industry best practices
- Provide technical leadership in enabling AI/ML initiatives by designing scalable, reliable, and well-governed data engineering solutions
- Effectively communicate technical concepts to both technical and non-technical stakeholders
- Promotions to different environments using GitHub CICD with GitHub Actions / Liquibase
- Participate in the evaluation and identification of new technologies
Requirements:
- Deep technical expertise in building and optimizing data pipelines and large-scale processing systems
- Deep technical expertise with Azure Cloud and Databricks
- Experience working with cloud solutions and contributing to data modernization efforts
- Strong programming skills (e.g., SQL, Python, Pyspark or Scala) for data manipulation and transformation
- Excellent understanding of data engineering principles, data architecture, and database management
- Excellent understanding of data modeling concepts and data structures
- Excellent understanding of source to target data mappings
- Strong experience building AI/ML data pipelines in Databricks
- Proficient in leveraging GenAI and agentic frameworks to develop data engineering solutions
- Strong problem-solving skills and attention to detail
- Excellent communication skills, with the ability to convey technical concepts to both technical and non-technical stakeholders
- Bachelor's degree in a related field (e.g., Computer Science, Information Technology, Data Science)
- Knowledge of healthcare or clinical research industries is a plus
- Strong technical aptitude and experience with a wide variety of technologies
- Ability to rapidly learn and if required evaluate a new tool or technology
- Strong verbal & written communication skills
- Demonstrated technical experience
- Be an innovative thinker
- Must have a strong customer and quality focus