Develop software programs, algorithms and automated processes that cleanse, integrate and evaluate large data sets from multiple disparate sources
Manipulate large amounts of data across a diverse set of subject areas, collaborating with other data scientists and data engineers to prepare data pipelines for various modeling protocols
Build, validate, and maintain AI (Machine Learning (ML) /Deep learning) models, diagnose and optimize performance and develop statistical models and analysis for ad hoc business focused analysis
Communicate meaningful, actionable insights from large data and metadata sources to stakeholders
Responsible for quality of services and advice in meeting business partner needs
Responsible for end results of team and shares responsibility over resources, budget and adherence to policies.
Requirements
Advanced proficiency in R, Python, Spark, Hive (or other MR), and common scripting languages for E2E pipeline
Advanced proficiency using SQL for efficient manipulation of large datasets in on prem and cloud distributed computing environments, such as Azure environments
Experience with ML and classical predictive techniques such as logistic regression, decision trees, non linear regressions, ANN/CNN, boosted trees, SVM, Tensorflow, visualization packages, and a track record for creating business impact with these methods
Ability to work both at a detailed level as well as to summarize findings and extrapolate knowledge to make strong recommendations for change
Ability to collaborate with cross functional teams and influence product and analytics roadmap, with a demonstrated proficiency in relationship building
Ability to assess relatively complex situations and analyze data to make judgments and recommend solutions.