Designing, developing, and maintaining data pipelines, including both batch ETL processes and real-time streaming solutions, to support our product teams.
Implementing and managing Snowplow-based event tracking pipelines to collect, validate, and process user behavioral data in real time for analytics and product insights
Collaborating with cross-functional teams (product managers, analysts, data scientists, etc.) to understand data needs and deliver insightful, scalable data solutions
Replicating and generalizing successful data pipeline patterns to accelerate new pipeline development and ensure consistency and reliability across projects
Developing reusable data processing utilities and tooling (leveraging common data-centric libraries and frameworks in Python) to streamline ETL/ELT workflows
Optimizing database performance and ensuring high reliability of our data stores by performing query optimization, indexing, and tuning of SQL queries
Monitoring and enhancing Snowplow pipeline performance and data quality: troubleshooting pipeline issues, optimizing event collection, and implementing improvements to maximize uptime and data accuracy
Supporting data governance and quality initiatives to maintain data integrity, privacy compliance, and consistency across all data pipelines
Providing technical guidance on data integration, transformation, and analytics best practices to team members and stakeholders
Building reports and dashboards (in collaboration with BI/Analytics teams) to empower product and business teams with actionable insights from collected data
Acting as a data solutions expert for the organization: advising and assisting product teams in selecting the right data architectures, tools, and approaches (e.g., choosing the appropriate data storage, streaming service, or analytics tool for a given need)
Translating business and user requirements into technical specifications by working closely with product managers and engineers to ensure data solutions meet real-world needs
Requirements
5+ years of experience in data engineering, database development, and cloud-based data solutions (especially on AWS)
Strong proficiency in SQL (T-SQL, PL/SQL) and experience with database technologies (e.g., Oracle, SQL Server, Snowflake/Redshift)
Hands-on experience with ETL/ELT tools and frameworks, including modern cloud integration services (e.g., AWS Glue, Apache Airflow or Azure Data Factory) and dbt
Experience with Snowplow or similar event data tracking pipelines, including their implementation, maintenance, and optimization for behavioral data collection and analytics
Experience with data modeling, data integration, and data warehousing concepts
Strong programming skills in Python (e.g., Pandas, automation scripting)
Knowledge of data governance and data quality frameworks, as well as security best practices for data (e.g., GDPR)
Experience working in Agile development environments