NBCUniversal is one of the world's leading media and entertainment companies, seeking a Data Engineer to join their Engineering & Operations team. The role involves designing, building, and operating scalable data pipelines and privacy-safe data integrations to support NBCUniversal’s data collaboration ecosystem.
Responsibilities:
- Support partner onboarding into clean room environments across platforms such as Snowflake, LiveRamp, Databricks, or similar technologies
- Follow clean room architecture patterns that are secure, scalable, privacy-preserving, and repeatable across partner engagements
- Configure and manage clean room environments, including data access, environment setup, platform configuration, and release validation
- Serve as the technical owner for assigned partner onboarding efforts, coordinating with product, engineering, operations, privacy, and partner-facing teams
- Implement privacy-preserving controls such as aggregation thresholds, anonymization techniques, approved query patterns, and output validation checks
- Deploy and manage Python-based libraries, templates, and reusable components within the clean room and data platform ecosystem
- Support environment setup, configuration management, package deployment, and version-controlled release processes
- Partner with software engineering teams to operationalize reusable libraries for audience, measurement, reporting, and partner-facing workflows
- Ensure platform components are deployed consistently across partner environments and aligned with established engineering standards
- Design, implement, and enforce granular role-based access control policies across data platform environments
- Configure least-privilege service accounts, roles, grants, schemas, shares, and data access patterns
- Partner with security, privacy, and platform teams to ensure access controls meet internal policies and partner-specific requirements
- Validate that partner-facing outputs adhere to privacy, security, and business requirements before release
- Design, build, and operate scalable ELT pipelines using advanced SQL, Snowpark, PySpark, dbt, or similar technologies
- Develop and provision curated Gold datasets for audience, measurement, activation, and reporting use cases
- Build reusable pipeline patterns that support batch and near real-time processing across Snowflake, Databricks, or similar platforms
- Translate business and analytical requirements into reliable, well-documented, production-ready data products
- Own pipeline performance, reliability, data correctness, and operational support for assigned data products
- Implement and evolve identity resolution logic that maps internal NBCU data to third-party identifiers such as LUIDs, RampIDs, TransUnion IDs, or similar identity frameworks
- Support privacy-safe identity workflows for audience matching, measurement, activation, and partner collaboration
- Build validation checks to ensure identity mappings are accurate, secure, and compliant with approved usage patterns
- Work with internal teams and external partners to troubleshoot match rates, data quality issues, and onboarding discrepancies
- Build automated data quality checks using tools such as Great Expectations, dbt tests, custom SQL assertions, or similar frameworks
- Define and monitor quality standards for schema drift, null rate spikes, volume anomalies, duplicate records, referential integrity, and unexpected data distribution changes
- Create test strategies for partner-facing releases, including input validation, output validation, regression testing, and privacy checks
- Document data assumptions, known limitations, validation logic, and operational support procedures
- Optimize query performance and platform costs through query tuning, clustering/partitioning strategies, caching, incremental processing, and workload management
- Implement query tagging, workload tracking, and chargeback/showback models to improve cost transparency and partner-level attribution
- Establish monitoring, alerting, runbooks, and standard operating procedures to improve platform reliability and reduce incident time-to-resolution
- Participate in incident response, root cause analysis, and continuous improvement efforts for production data workflows
Requirements:
- Bachelor's degree or equivalent practical experience in Computer Science, Information Systems, Software Engineering, Electrical Engineering, Electronics Engineering, Data Engineering, or a related technical field
- 3+ years of experience in data engineering, including building and operating production data pipelines, data models, and data products
- Deep proficiency in advanced SQL and Python for data processing, automation, pipeline development, validation, and operational support
- 2+ years of hands-on experience with cloud data platforms such as Snowflake, Databricks, or similar technologies
- Experience building scalable ELT pipelines using tools such as Airflow, dbt, Snowpark, PySpark, or similar technologies
- Exposure to data clean room concepts or platforms such as Snowflake Clean Rooms, Databricks Clean Rooms, LiveRamp, Habu, or similar technologies
- Exposure to advertising technology, audience activation, campaign delivery, reach and frequency, attribution, incrementality, or reporting workflows
- Experience working with identity graphs, hashed identifiers, RampIDs, LUIDs, TransUnion IDs, device IDs, household IDs, or similar identity frameworks
- Snowflake SnowPro Core Certification, Databricks Certified Data Engineer Associate, or similar cloud/data platform certification