Role Overview
- Own the architectural direction for streaming data ingestion from GCP into AWS
- Design resilient ingestion frameworks including error handling, retry strategies, monitoring, and failure isolation
- Implement distributed processing pipelines using Spark / PySpark or similar frameworks
Data Warehousing & DBT Leadership
- Create and maintain scalable data warehouses and associated ETL/ELT processes using DBT models in Amazon Redshift
- Design and implement DBT projects including macros, tests, documentation, and reusable modeling patterns
- Conduct Redshift query and DBT performance tuning to optimize warehouse efficiency and cost
Engineering Standards & Quality
- Define and enforce best practices for:
- Data modeling
- Version control (Git-based workflows)
- CI/CD pipelines for DBT deployments
- Automated testing at model, transformation, and pipeline levels
- Ensure robust testing is embedded into every DBT model (schema tests, custom tests, data validation checks)
- Lead code reviews and architectural design reviews
AWS Platform & Big Data Tooling
- Work with AWS services including Redshift, S3, Glue, Step Functions, Lambda (Python), Athena, and EMR
Requirements
- 6+ years of data engineering experience in big data environments
- Proven experience designing and implementing streaming architectures
- Extensive hands-on DBT experience (models, macros, tests, documentation)
- Strong Amazon Redshift architecture and performance optimization expertise
- Experience building CI/CD pipelines for data platforms
- Experience working in client-facing delivery contexts
Technical Skills
- AWS: Redshift, S3, Glue, Step Functions, Lambda (Python), Athena, EMR
- Strong SQL and Redshift performance tuning expertise
- Python and PySpark (or equivalent distributed processing frameworks)
- Git-based version control workflows
- Deep understanding of data warehousing, modeling, and big data systems
Tech Stack
- Amazon Redshift
- AWS
- ETL
- Google Cloud Platform
- PySpark
- Python
- Spark
- SQL