Role Overview

Design and develop end-to-end cloud-based solutions with a strong emphasis on data applications and infrastructure
Lead discovery and design sessions with customers to gather requirements and translate functional needs into detailed designs
Create and contribute to technical design documents and other project-related documentation
Work with stakeholders to identify technical and business requirements, and apply best practices and standards to achieve successful project outcomes
Regularly demonstrate proficiency in established practices and standards for cloud solutions
Write high-performance, reliable, and maintainable code
Develop test automation frameworks and associated tooling to ensure project success
Handle complex and diverse cloud-based projects, including tasks such as collecting, managing, analyzing, and visualizing very large datasets
Build efficient and scalable data pipelines for batch and real-time use cases across various source and target systems
Optimize ETL/ELT pipelines, troubleshoot pipeline issues, and enhance observability dashboards
Execute data pipeline-specific DevOps activities, such as IaC provisioning, implementing data security, and automation
Analyze potential issues, perform root cause analyses, and resolve technical challenges
Review bug descriptions, functional requirements, and design documents to ensure comprehensive testing plans and cases
Performance tuning of batch and real-time data processing pipelines
Ensure security best practices are followed when working on internal and customer-facing cloud data platforms
Build foundational CI/CD pipelines for all infrastructure components, data pipelines, and custom data applications
Develop observability and data quality solutions for data platforms, including ML and AI applications
Act as a trusted advisor for customers, addressing technical queries and providing support
Engage in thought leadership activities such as whitepaper authoring, conference presentations, and podcasting
Suggest and implement ways to improve project progress and efficiency
Participate in pre-sales activities when required

Requirements

Experience in implementing complex data architecture, data modeling, data design, and persistence (e.g., warehousing, data marts, data lakes)
Proficiency in a programming language such as Python, Java, Go, or Scala
Experience with big data cloud technologies like Microsoft Fabric, Databricks, EMR, Athena, Glue, BigQuery, Dataproc, and Dataflow
Ideally, you will have specific strong hands-on experience working with Google Cloud Platform data technologies—Google BigQuery, Google DataFlow, and executing PySpark and SparkSQL code at Dataproc
Solid understanding of Spark (PySpark or SparkSQL), including using the DataFrame Application Programming Interface as well as analyzing and performance tuning Spark queries
Strong experience in data orchestration using Apache Airflow
Develop frameworks and solutions that enable us to acquire, process, monitor, and extract value from large datasets
Highly proficient in SQL
Strong experience in using code repositories such as GitHub and demonstrable GitOps best practices
Bring a good knowledge of popular database and data warehouse technologies and concepts from Google, Amazon, or Microsoft (Cloud & Conventional RDBMS), such as BigQuery, Redshift, Microsoft Azure SQL Data Warehouse, Snowflake, etc
Have knowledge of how to design distributed systems and the trade-offs involved, including working with software engineering best practices for development, networking, source control systems, automated deployment pipelines like Jenkins, and DevOps tools like Terraform
Have strong knowledge of CI/CD tools and frameworks such as Jenkins and GitLab to implement DevOps pipelines
Proficiency in using GenAI tools for productivity e.g. Copilot
Have strong knowledge of data orchestration solutions like Oozie, Luigi, or Talend
Have strong knowledge of DBT (Data Build Tool) or Dataform
Experience in Snowflake
Experience with Apache Iceberg, Hudi, and query engines like Presto (Trino)
Knowledge of data catalogs (AWS Glue, Google DataPlex) and data governance or data quality solutions (e.g., Great Expectations) is an added advantage
Experience in performing DevOps activities such as IaC using Terraform, provisioning infrastructure in GCP/AWS/Azure, defining data security layers, etc
Experience in designing microservice architecture, REST API gateways is a plus
Knowledge of MLOps frameworks and orchestration pipelines such as Kubeflow or TFX is a plus
Certification in GCP, Azure, AWS, Snowflake, Databricks.

Tech Stack

Airflow
Amazon Redshift
Apache
AWS
Azure
BigQuery
Cloud
Distributed Systems
ETL
Google Cloud Platform
Java
Jenkins
PySpark
Python
RDBMS
Scala
Spark
SQL
Terraform
Go

Benefits

Competitive total rewards package
Blog during work hours; take a day off and volunteer for your favorite charity
Flexibly work remotely from your home, there’s no daily travel requirement to an office!
Substantial training allowance; participate in professional development days, attend training, become certified, whatever you like!
All the equipment you need to work from home including a laptop with your choice of OS, and an annual budget to personalize your work environment
Annual wellness budget to make yourself a priority (use it on gym memberships, massages, fitness and more)
Generous amount of paid vacation and sick days, as well as a day off to volunteer for your favorite charity

Data Engineer

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits