Role Overview

Develop and maintain a comprehensive data architecture and cloud strategy that aligns with the organization's goals and needs.
Design, implement, and manage cloud-based data infrastructure on AWS, ensuring scalability, reliability, and cost-efficiency.
Utilize AWS services (S3, Glue, EMR, Redshift, Lambda, Kinesis, MWAA, etc.) to build and optimize data pipelines and storage solutions.
Champion the use of data lakehouse architecture and optimize its performance for analytical and operational workloads.
Identify the gaps and opportunities in the current system and suggest/implement to optimise the processes and costs.
Lead and guide data engineering teams to develop, maintain, and optimize ETL processes for data ingestion, transformation, and loading.
Implement real-time data processing solutions using technologies such as Apache Kafka and AWS Kinesis.
Collaborate with data scientists, business stakeholders and analysts to ensure data availability and quality, enabling effective analytics and reporting.
Leverage DBT for data modelling and transformation to support self-service analytics and data governance.
Architect and implement data integration solutions for API ingestion, enabling data from diverse sources to be captured, transformed, and ingested into our data lakehouse.
Utilize Airbyte and custom APIs to ensure efficient, reliable, and secure data transfers.
Manage data integration pipelines to support real-time and batch data processing.
Design, configure, and maintain workflow orchestration using Apache Airflow to automate ETL processes and data pipeline executions.
Monitor and optimize job scheduling, error handling, and performance of data workflows.
Implement data security protocols, access controls, and encryption to safeguard sensitive data, especially PIIs.
Ensure compliance with data privacy regulations and industry standards.
Collaborate with cross-functional teams to understand data requirements and provide data solutions to meet their needs.
Maintain comprehensive documentation for data engineering and data architecture processes and solutions.
Guide the team in setting up cloud Infra and automate using tools like terraform, cloud formation, Jenkins etc
Guide the operations team in setting up automated monitoring & alerts mechanism

Requirements

Bachelor's or higher degree in a relevant field.
6+ years of proven experience in data engineering, cloud architecture, and AWS services.
Extensive knowledge of data lakehouse technologies, Hudi, DBT, Airbyte, Redshift, Glue, Kinesis and Apache Airflow.
Strong expertise in programming languages like SQL, Python and processing frameworks like PySpark
Strong expertise in real-time data processing.
Excellent problem-solving and analytical skills.
Strong communication and teamwork abilities.
Passion for Sports/Gaming/Entertainment is preferred

Tech Stack

Airflow
Amazon Redshift
Apache
AWS
Cloud
ETL
Jenkins
Kafka
PySpark
Python
SQL
Terraform

AWS Data Architect

Key skills

About this role

Role Overview

Requirements

Tech Stack