Engage regularly with model developers, validators, and risk stakeholders to understand their evolving data needs for model development, monitoring, and governance
Partner with credit analytics, risk, fraud, marketing, and operations functions to identify, define, and prioritize use cases requiring model-ready data
Build scalable data architectures to support real-time and batch monitoring, including data ingestion, enrichment, and retention practices
Support pipeline development by designing and maintaining automated end-to-end ML pipelines for data collection, preprocessing, feature engineering, and model training
Conduct data transformation by converting raw observations into variables (features) that machine learning models can understand, such as turning timestamps into cyclical time features
Transforming theoretical data science prototypes into robust, high-performance software systems that can handle large volumes of real-time data
Build and maintain automated pipelines that handle not just code, but also data validation, model training, and artifact management
Design, develop, and maintain robust pipelines to collect, transform, and store data used in model monitoring workflows (e.g., scoring data, performance metrics, outcomes)
Provide thought and technical leadership in generating new signals from raw data by applying techniques such as normalization, scaling and categorical encoding
Integrate data pipelines with model lifecycle platforms, MLOps tools, and observability solutions to ensure seamless model performance tracking
Partner with model risk and compliance teams to ensure data lineage, audit trails, and documentation are preserved and accessible for regulatory reviews (e.g., SR 11-7 compliance)
Liaise with cloud, data lake, data warehouse, and model governance engineering teams on delivery execution and backlog prioritization
Collaborate with data scientists, model validators, and product managers to align monitoring data infrastructure with evolving model monitoring requirements
Optimize data storage and compute performance for large-scale monitoring use cases involving high-frequency scoring or model ensembles
Requirements
Bachelor’s degree in a quantitative, technical, or data-focused field (e.g., Statistics, Mathematics, Computer Science, Data Science, Engineering) with 6+ years’ experience OR in lieu of a degree 8+years of relevant work experience in monitoring, validation, or credit risk strategy
Minimum 6+ years of professional experience in model operations, data engineering, or analytics infrastructure
Strong proficiency with data engineering tools and frameworks (e.g., Apache Spark, Airflow, Kafka, dbt, PySpark)
Proficient in programming languages such as SAS, Python, and SQL for building monitoring pipelines and validation checks
Experience with cloud-based data infrastructure (e.g., AWS, Azure, GCP) and data warehousing (e.g., Snowflake, Redshift, BigQuery)
Familiarity with MLOps practices, model metadata tracking (e.g., MLflow), and monitoring toolkits (e.g., Evidently AI, WhyLabs, Prometheus)
Understanding of model risk governance requirements and the role of data engineering in ensuring compliant model monitoring
Ability to work in an agile environment and deliver high-quality, production-grade code in collaboration with DevOps and platform engineering teams
Tech Stack
Airflow
Amazon Redshift
Apache
AWS
Azure
BigQuery
Cloud
Google Cloud Platform
Kafka
Prometheus
PySpark
Python
Spark
SQL
Benefits
best-in-class employee benefits
programs that cater to work-life integration and overall well-being