Prodege, LLC is a cutting-edge marketing and consumer insights platform, and they are seeking a Principal Data Engineer to lead the design and modernization of their data architecture. The role involves building scalable data pipelines, ensuring data quality, and supporting AI/ML initiatives to optimize the company’s flagship products.
Responsibilities:
- Lead the design and implementation of the Lakehouse architecture (Iceberg/Trino) and refactor complex legacy data systems into modern patterns
- Design, build, and optimize high-scale, reliable ELT/ETL data pipelines using expert-level SQL, Python, Snowflake, and dbt
- Own the observability, lineage, quality, and governance frameworks for mission-critical datasets across the multi-product ecosystem
- Directly support Data Science and ML Engineering teams by delivering production-grade data sets and optimizing feature engineering pipelines
- Elevate the engineering bar across the team, championing best practices and utilizing AI-assisted development tools to accelerate workflow
- Architect, design, and implement components of the next-generation Lakehouse platform, leveraging Iceberg, Trino, and Snowflake
- Lead the simplification and refactoring efforts for complex, high-volume legacy pipelines, migrating them to modern, declarative ELT patterns (primarily via dbt)
- Define and implement best practices for data storage, partitioning, clustering, and schema evolution to optimize performance and reduce cloud compute costs
- Design, build, and maintain scalable, reliable data pipelines (batch and near real-time) using Python, expert-level SQL, and orchestration tools (e.g., Airflow, etc.)
- Develop and enhance Snowflake data models, dbt models, and high-performance analytical data marts for consumption by BI, reporting, and product applications
- Own the entire pipeline lifecycle: requirements gathering $ o$ design $ o$ build $ o$ unit/integration testing $ o$ deployment $ o$ monitoring $ o$ iteration
- Implement and enhance data lineage, quality checks (via dbt tests/Great Expectations), observability, and alerting across core data pipelines
- Collaborate with Data Governance and Security teams to enforce data access controls, PII handling, and retention policies
- Continuously monitor and tune pipeline performance to meet strict data SLAs (Service Level Agreements) and SLOs (Service Level Objectives)
- Work closely with Data Science and ML Engineering teams to understand and enable their training and serving data needs
- Design and optimize data feeds for high-volume Machine Learning workloads, including the development of feature stores and model-serving pipelines
- Ensure data consistency and integrity for critical AI-driven applications across consumer and business products
- Actively use AI-assisted development tools (like GitHub Copilot, Gemini, etc.) to accelerate coding, generate documentation, draft tests, and simplify complex spec generation
- Set high technical standards for code quality, testing, and documentation within the Data Engineering team
- Provide technical leadership and mentorship to junior and mid-level engineers, running design reviews and driving consensus on architectural trade-offs
Requirements:
- Bachelors' degrees in Computer Science or equivalent are of study, or equivalent years of relevant experience
- Six or more (6+) years of hands-on experience in data engineering, ideally in multi-product, high-volume, or consumer-scale environments
- Expert-level proficiency in SQL, strong Python, and extensive experience building robust ETL/ELT workflows
- Strong experience with Snowflake and dbt (Data Build Tool) for data transformation and analytics engineering
- Proven experience with modern data modeling techniques (e.g., Kimball, Data Vault, semantic layers) and performance tuning of large queries
- Experience with Iceberg, Trino, or similar open table format/query engine ecosystems in a Lakehouse architecture
- Ability to navigate and refactor complex, interconnected data systems with an Ownership Mindset (you build it, you run it)
- Experience with Kafka, Kinesis, or Apache Flink for streaming ingestion and event-driven data architectures
- Familiarity with feature stores, model-serving pipelines, and MLOps practices
- Professional experience using AI-driven development tools (e.g., GitHub Copilot, etc.) for coding, testing, or documentation generation
- Prior experience in a consumer rewards, survey, or performance marketing ecosystem