Prodege, LLC is a cutting-edge marketing and consumer insights platform, and they are seeking a Principal Data Engineer to lead the design and modernization of their data architecture. This role involves building scalable data platforms and pipelines, ensuring data quality, and supporting data science and machine learning initiatives.

Responsibilities:

Lead the design and implementation of the next-generation Lakehouse architecture (Iceberg / Trino / Snowflake) and refactor complex legacy data systems into modern, scalable patterns
Design, build, and optimize high-scale, reliable ELT / ETL and streaming data pipelines using expert-level SQL, Python, Snowflake, dbt, and modern orchestration patterns
Own the observability, lineage, quality, reliability, and governance frameworks for mission-critical datasets across the multi-product ecosystem
Directly support Data Science and ML Engineering teams by delivering production-grade datasets, feature pipelines, and scalable data foundations for experimentation, model development, and decisioning
Elevate the engineering bar across the team, set architectural standards, mentor engineers, and champion AI-assisted / AI-first development practices where they meaningfully improve productivity and quality
Guide the organization on Medallion architecture, data contracts, schema evolution, tool selection, and scalable platform patterns while operating effectively in ambiguous and evolving problem spaces
Architect, design, and implement components of the next-generation data platform / Lakehouse, leveraging Iceberg, Trino, Snowflake, and related modern data technologies
Lead the simplification and refactoring of complex, high-volume legacy pipelines, migrating them toward modern, declarative ELT patterns (primarily via dbt) and scalable streaming / event-driven designs where appropriate
Define and implement best practices for data storage, partitioning, clustering, schema evolution, and query design to optimize performance, reliability, and cloud compute cost
Define and evangelize scalable architectural patterns, including Medallion architecture, data contracts, schema management, and platform standards across the data lifecycle
Help the organization make pragmatic architectural decisions in ambiguous environments, balancing long-term platform design with short-term business needs
Design, build, and maintain scalable, reliable data pipelines (batch and near-real-time) using Python, expert-level SQL, orchestration tools (e.g., Airflow or similar), and modern data platform components
Develop and enhance Snowflake data models, dbt models, and high-performance analytical data marts for consumption by BI, reporting, product applications, experimentation, and ML systems
Own the entire pipeline lifecycle: requirements gathering → design → build → testing → deployment → monitoring → iteration
Design and guide streaming and event-driven architectures where needed to support real-time or near-real-time use cases
Implement and enhance data lineage, quality checks, observability, alerting, and reliability practices across core data pipelines and datasets
Collaborate with Data Governance and Security teams to enforce data access controls, PII handling, retention policies, and compliance requirements
Continuously monitor and tune pipeline performance to meet strict data SLAs / SLOs
Establish high standards for trustworthy, well-governed data that can serve as the foundation for BI, ML, and business decision-making
Work closely with Data Science and ML Engineering teams to understand and enable their training, inference, experimentation, and data serving needs
Design and optimize data feeds for high-volume ML workloads, including feature pipelines, training datasets, and feature-store-like patterns where needed
Ensure data consistency, quality, and integrity for critical AI-driven applications across consumer and business products
Help build the data platform in a way that supports experimentation, model iteration, and scalable ML use cases across Performance Marketing, Rewards, CX, and related domains
Actively use AI-assisted development tools (e.g., Copilot, Claude, Gemini, or similar) to accelerate coding, testing, documentation, troubleshooting, and architectural exploration
Drive a strong AI-first mindset within the data organization by identifying where AI can improve developer productivity, pipeline development, debugging, design exploration, and documentation — while maintaining strong validation discipline
Set high technical standards for code quality, testing, documentation, reliability, and maintainability within the Data Engineering team
Provide technical leadership and mentorship to junior and mid-level engineers, running design reviews and driving consensus on architectural trade-offs
Help grow the team’s capability to solve increasingly complex data problems through the right combination of tooling, process, architecture, and talent
Partner closely with ML, BI, Product, Engineering, Analytics, and business stakeholders to ensure the data platform supports real business and product needs
Translate business needs and ambiguous requirements into scalable, practical technical designs
Guide teams on the selection and use of the right data tools, technologies, and platform components across the data lifecycle

Requirements:

Bachelor's degree in Computer Science, Engineering, a quantitative field, or equivalent practical experience
Six or more (6+) years of hands-on experience in Data Engineering, ideally in AdTech, MarTech, Growth, consumer internet, or other high-volume / multi-product environments
Expert-level proficiency in SQL, strong Python, and extensive experience building robust ETL / ELT workflows
Strong experience with Snowflake and dbt for data transformation and analytics engineering
Proven experience designing and building modern data platforms at scale
Strong experience with batch and near-real-time data pipelines, event-driven systems, and/or streaming architectures
Strong understanding of Medallion architecture, modern data modeling techniques, data contracts, schema evolution, and platform design patterns for analytics and ML
Proven experience with performance tuning of large queries, cost / performance tradeoffs, and reliability of data infrastructure
Experience with Iceberg, Trino, or similar open table format / query engine ecosystems in a Lakehouse architecture
Ability to navigate and refactor complex, interconnected data systems with an ownership mindset (“you build it, you run it”)
Experience partnering cross-functionally with Data Science, BI, Product, Engineering, and business teams to build scalable and trusted data foundations
Ability to bring an AI-first mindset to the data engineering organization and help teams use AI effectively to solve complex data engineering problems
Strong mentoring and technical leadership skills with ability to influence architecture and engineering direction across teams
Comfortable working through ambiguity and helping define strategy in greenfield or evolving environments
Experience with Kafka, Kinesis, or Apache Flink for streaming ingestion and event-driven data architectures
Familiarity with feature stores, model-serving pipelines, and MLOps practices
Professional experience using AI-driven development tools (e.g., GitHub Copilot, etc.) for coding, testing, or documentation generation
Prior experience in a consumer rewards, survey, or performance marketing ecosystem

Principal Data Engineer

Key skills

About this role

Responsibilities:

Requirements: