Define partner onboarding and clean room architecture patterns across Snowflake, LiveRamp, and Databricks that are secure, scalable, and repeatable.
Configure and manage partner-specific clean room environments; deploy and manage Python-based libraries within the platform ecosystem.
Establish and maintain MLOps practices, including model serving, monitoring, and pipeline orchestration for AI/ML features deployed within the platform ecosystem.
Own design and enforcement of granular RBAC policies and least-privilege service accounts.
Serve as the technical lead for onboarding new partners, implementing privacy-preserving controls (e.g., aggregation thresholds and anonymization techniques).
Design, build, and operate scalable ELT pipelines using Snowpark and/or PySpark and advanced SQL to provision Gold datasets.
Implement and evolve identity resolution logic mapping internal data to 3P identifiers (including LUIDs, RampIDs, TransUnion IDs), ensuring privacy-safe practices.
Design and operate scalable data architectures across Snowflake and Databricks supporting batch and near real-time processing patterns.
Build robust automated checks (e.g., Great Expectations or custom SQL assertions) and define quality standards to detect schema drift, null rate spikes, and volume anomalies.
Lead performance optimization across platforms (query tuning, caching, incremental processing) and define and implement query tagging and chargeback models for accurate cost attribution.
Establish monitoring, alerting, runbooks, and standard operating procedures to improve platform reliability and reduce incident time-to-resolution.
Validate that output data adheres to privacy and business requirements, and define test strategies for partner-facing releases.
Serve as the escalation point for diagnosing connection failures, data discrepancies, or latency issues with partner technical teams.
Design and build internal AI agents (using frameworks like LangChain, Snowflake Cortex) and mentor other engineers through code reviews, design discussions, and operational best practices.
Requirements
Bachelor’s degree or higher in Computer Science, Information Systems, Software, Electrical or Electronics Engineering.
5+ years of Data Engineering experience, with deep proficiency in advanced SQL and Python.
3+ years of hands-on experience with cloud data platforms, specifically Snowflake or Databricks.
Proven experience building and operating scalable ELT pipelines using orchestration tools (e.g., Airflow, dbt).
Strong track record designing production-grade systems (observability, reliability, performance tuning, incident response).
Clean Room Knowledge: Exposure to Data Clean Room concepts and Clean Room platforms like LiveRamp, Snowflake or Databricks.
AI/LLM Experience: Experience building applications with LLMs, RAG, Vector Databases, or frameworks like LangChain/LlamaIndex.
Ability to mentor other engineers through code reviews, design discussions, and operational best practices.