Prodege, LLC is a cutting-edge marketing and consumer insights platform, and they are seeking a Principal Data Engineer to lead the design and modernization of their data architecture. This role involves building scalable data platforms and pipelines, ensuring data quality, and supporting data science and machine learning initiatives.
Responsibilities:
- Lead the design and implementation of the next-generation Lakehouse architecture (Iceberg / Trino / Snowflake) and refactor complex legacy data systems into modern, scalable patterns
- Design, build, and optimize high-scale, reliable ELT / ETL and streaming data pipelines using expert-level SQL, Python, Snowflake, dbt, and modern orchestration patterns
- Own the observability, lineage, quality, reliability, and governance frameworks for mission-critical datasets across the multi-product ecosystem
- Directly support Data Science and ML Engineering teams by delivering production-grade datasets, feature pipelines, and scalable data foundations for experimentation, model development, and decisioning
- Elevate the engineering bar across the team, set architectural standards, mentor engineers, and champion AI-assisted / AI-first development practices where they meaningfully improve productivity and quality
- Guide the organization on Medallion architecture, data contracts, schema evolution, tool selection, and scalable platform patterns while operating effectively in ambiguous and evolving problem spaces
- Architect, design, and implement components of the next-generation data platform / Lakehouse, leveraging Iceberg, Trino, Snowflake, and related modern data technologies
- Lead the simplification and refactoring of complex, high-volume legacy pipelines, migrating them toward modern, declarative ELT patterns (primarily via dbt) and scalable streaming / event-driven designs where appropriate
- Define and implement best practices for data storage, partitioning, clustering, schema evolution, and query design to optimize performance, reliability, and cloud compute cost
- Define and evangelize scalable architectural patterns, including Medallion architecture, data contracts, schema management, and platform standards across the data lifecycle
- Help the organization make pragmatic architectural decisions in ambiguous environments, balancing long-term platform design with short-term business needs
- Design, build, and maintain scalable, reliable data pipelines (batch and near-real-time) using Python, expert-level SQL, orchestration tools (e.g., Airflow or similar), and modern data platform components
- Develop and enhance Snowflake data models, dbt models, and high-performance analytical data marts for consumption by BI, reporting, product applications, experimentation, and ML systems
- Own the entire pipeline lifecycle: requirements gathering → design → build → testing → deployment → monitoring → iteration
- Design and guide streaming and event-driven architectures where needed to support real-time or near-real-time use cases
- Implement and enhance data lineage, quality checks, observability, alerting, and reliability practices across core data pipelines and datasets
- Collaborate with Data Governance and Security teams to enforce data access controls, PII handling, retention policies, and compliance requirements
- Continuously monitor and tune pipeline performance to meet strict data SLAs / SLOs
- Establish high standards for trustworthy, well-governed data that can serve as the foundation for BI, ML, and business decision-making
- Work closely with Data Science and ML Engineering teams to understand and enable their training, inference, experimentation, and data serving needs
- Design and optimize data feeds for high-volume ML workloads, including feature pipelines, training datasets, and feature-store-like patterns where needed
- Ensure data consistency, quality, and integrity for critical AI-driven applications across consumer and business products
- Help build the data platform in a way that supports experimentation, model iteration, and scalable ML use cases across Performance Marketing, Rewards, CX, and related domains
- Actively use AI-assisted development tools (e.g., Copilot, Claude, Gemini, or similar) to accelerate coding, testing, documentation, troubleshooting, and architectural exploration
- Drive a strong AI-first mindset within the data organization by identifying where AI can improve developer productivity, pipeline development, debugging, design exploration, and documentation — while maintaining strong validation discipline
- Set high technical standards for code quality, testing, documentation, reliability, and maintainability within the Data Engineering team
- Provide technical leadership and mentorship to junior and mid-level engineers, running design reviews and driving consensus on architectural trade-offs
- Help grow the team’s capability to solve increasingly complex data problems through the right combination of tooling, process, architecture, and talent
- Partner closely with ML, BI, Product, Engineering, Analytics, and business stakeholders to ensure the data platform supports real business and product needs
- Translate business needs and ambiguous requirements into scalable, practical technical designs
- Guide teams on the selection and use of the right data tools, technologies, and platform components across the data lifecycle
Requirements:
- Bachelor's degree in Computer Science, Engineering, a quantitative field, or equivalent practical experience
- Six or more (6+) years of hands-on experience in Data Engineering, ideally in AdTech, MarTech, Growth, consumer internet, or other high-volume / multi-product environments
- Expert-level proficiency in SQL, strong Python, and extensive experience building robust ETL / ELT workflows
- Strong experience with Snowflake and dbt for data transformation and analytics engineering
- Proven experience designing and building modern data platforms at scale
- Strong experience with batch and near-real-time data pipelines, event-driven systems, and/or streaming architectures
- Strong understanding of Medallion architecture, modern data modeling techniques, data contracts, schema evolution, and platform design patterns for analytics and ML
- Proven experience with performance tuning of large queries, cost / performance tradeoffs, and reliability of data infrastructure
- Experience with Iceberg, Trino, or similar open table format / query engine ecosystems in a Lakehouse architecture
- Ability to navigate and refactor complex, interconnected data systems with an ownership mindset (“you build it, you run it”)
- Experience partnering cross-functionally with Data Science, BI, Product, Engineering, and business teams to build scalable and trusted data foundations
- Ability to bring an AI-first mindset to the data engineering organization and help teams use AI effectively to solve complex data engineering problems
- Strong mentoring and technical leadership skills with ability to influence architecture and engineering direction across teams
- Comfortable working through ambiguity and helping define strategy in greenfield or evolving environments
- Experience with Kafka, Kinesis, or Apache Flink for streaming ingestion and event-driven data architectures
- Familiarity with feature stores, model-serving pipelines, and MLOps practices
- Professional experience using AI-driven development tools (e.g., GitHub Copilot, etc.) for coding, testing, or documentation generation
- Prior experience in a consumer rewards, survey, or performance marketing ecosystem