Role Overview

Architect data lake/lakehouse platforms for forecasting, anomaly detection, and BI/analytics using Databricks, Spark, and Delta-style patterns.
Set and enforce engineering standards for data modeling, integration, and pipeline design across teams/products.
Lead cloud event-driven/microservices architecture that integrates with web front ends and APIs.
Design, build, and optimize batch and event-driven ELT/ETL pipelines for analytical and operational workloads.
Build ingestion/transformation flows aligned to bronze/silver/gold, with validation, CI/CD, testing, and observability baked in.
Implement scalable time-series forecasting training and inference across categories.
Build model monitoring (RMSE/MAE/bias/coverage/residuals) with dashboards and automated alerts.
Develop residual/outlier detection using Z-scores, PELT change points, and confidence-interval breach checks.
Implement classification/risk-scoring models (e.g., Logistic Regression, Random Forest, XGBoost, clustering, HMMs) for anomaly classification and category risk.
Automate data quality and schema validation (missing dates/targets, type changes, schema evolution, allocation shifts).
Detect drift in data and model behavior (e.g., allocation stability, category mix changes).
Deliver per-category anomaly flags and summarized insights for decision-makers.
Provide technical leadership (code/design reviews, mentoring, knowledge sharing) for engineers and data scientists.
Collaborate with product/UX/business to refine requirements, prioritize work, and plan roadmaps.
Champion engineering excellence: quality, performance, and operational readiness.

Requirements

Typically, you’ll bring 7+ years of experience in data engineering, Machine Learning and building production-grade data pipelines and platforms in a cloud environment.
A Bachelor’s degree in Computer Science, Engineering, Statistics, Mathematics, or a related field is required; a Master’s degree is a plus.
Preferred qualifications include experience with Databricks/Spark and lakehouse architectures, along with relevant cloud certifications (e.g., Azure/AWS/GCP data engineering) or Databricks certifications.
Expert SQL (complex joins, window functions, performance tuning at scale).
Strong Python for data processing, APIs/microservices, and analytics.
Hands-on PySpark for large-scale distributed processing and ETL/ELT.
Solid data modeling (dimensional models, star/snowflake schemas, medallion design).
Proven Databricks experience (notebooks, jobs, clusters, Delta tables) and lakehouse architectures.
Strong grasp of cloud-native, event-driven architectures (queues, event buses, serverless triggers) for data/ML workflows.
Experience designing/operating microservices exposing REST/gRPC APIs for forecasting and analytics.
Hands-on, production experience with time-series forecasting (ARIMA or similar).
Applied anomaly detection/classification: residual analysis, Z-score, PELT, Logistic Regression, Random Forest, XGBoost, clustering, HMMs.
Familiarity with end-to-end predictive analytics, including model validation and monitoring.
Experience building monitoring for models/pipelines (metrics, dashboards, alerts), focused on error trends and drift.
Strong background in production data quality and schema monitoring, including automated checks/guardrails.
Familiar with CI/CD, version control, testing, and observability for data/ML systems.
Experience integrating data/ML back-end services with web front ends (APIs, payloads, error handling).
Understanding of authentication, authorization, and security for data/ML APIs in the cloud.
Technical Leadership: Lead design discussions, set standards, and mentor engineers/data scientists.
Comfortable with product/business/UX; turn ambiguity into clear technical roadmaps.
Explain trade-offs/metrics to non-technical stakeholders; strong documentation habits.
Analytical approach to data quality, drift, and performance issues; pragmatic solutions.
Takes end-to-end responsibility—from design/implementation to monitoring and improvement in production.
Work cross-functionally, support research teams/interns, and elevate engineering/analytics maturity.
Iterate through MVPs, experiment when justified, and adapt to evolving priorities.

Tech Stack

AWS
Azure
Cloud
ETL
Google Cloud Platform
GRPC
Microservices
PySpark
Python
Spark
SQL

Benefits

Health & Wellness: Health care coverage designed for the mind and body.
Flexible Downtime: Generous time off helps keep you energized for your time on.
Continuous Learning: Access a wealth of resources to grow your career and learn valuable new skills.
Invest in Your Future: Secure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs.
Family Friendly Perks: It’s not just about you. S&P Global has perks for your partners and little ones, too, with some best-in-class benefits for families.
Beyond the Basics: From retail discounts to referral incentive awards—small perks can make a big difference.

Data Scientist III

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits