Growth Protocol is an Enterprise Reasoning Platform headquartered in New York. They are seeking a Senior Data Engineer to play a foundational role in building the systems that power their AI platform, collaborating closely with various teams to maintain scalable data pipelines and ensure high-impact insights across industries.
Responsibilities:
- Work closely with Data Scientists to translate business and ML requirements into robust data workflows
- Ensure timely delivery of clean, reliable data to support model development and production features
- Engineer and manage scalable ETL architecture using Airflow, Snowpark, Cloud Run, and Apache Beam
- Design and implement a high-performance data infrastructure for seamless processing and integration
- Extract data from diverse online platforms
- Operationalize machine learning models, focusing on deployment, reliability, and performance
- Partner with client IT teams to identify the most efficient and secure methods for data ingestion including Snowflake Sharing, Databricks Delta Sharing, Private Link, and VPN
- Work alongside the Platform Engineering team to define requirements for secure networking paths that support high-performance data transfers
- Perform end-to-end testing of client connections to ensure data integrity and connectivity
- Integrate customer databases with our platform
- Create and manage real-time monitoring systems for data ingestion and transformation pipelines
- Proactively identify and resolve issues to maintain high levels of system reliability and data integrity
Requirements:
- 5+ years of experience in Data Engineering
- Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience
- Experience building data pipelines with robust unit and integration testing
- Proficiency in distributed computing frameworks including Apache Beam and Spark
- Functional understanding of enterprise networking including VPC peering, Private Link, and VPNs, with the ability to troubleshoot connectivity in a cloud environment
- Hands-on experience operationalizing ML models in production
- Familiarity with ML/AI, NLP, and Data Science workflows including MLFlow
- Deep understanding of ETL workflows, data modeling, and data architecture
- Strong debugging and problem-solving skills
- Excellent communication skills and experience collaborating across teams
- Experience working on enterprise products serving Fortune 500 clients across Financial Services, Industrials, and Consumer Products
- Prior startup experience
- Interest in current events, market dynamics, and emerging technologies
- Experience creating Agent Skills
- Familiarity with APIs and web scraping for data collection
- Familiarity with Graph Databases