ETL/ELT Pipeline Development: Build, and maintain scalable data pipelines using AWS. Implement both batch and incremental load patterns for BI reporting and application data needs.
Real-Time Data Streaming: Develop and manage real-time data ingestion pipelines using Kafka. Ensure low-latency, fault-tolerant data flow for critical business workflows.
Workflow Orchestration: Build, schedule, and monitor end-to-end data workflows using Apache Airflow. Manage dependencies, retries, and alerting for production DAGs.
Data Warehouse Management: Administer and optimize Amazon Redshift clusters including schema design, query performance tuning, distribution/sort keys, and vacuuming to ensure high availability and cost efficiency.
Data Quality & Observability: Implement automated data quality checks at ingestion and transformation stages. Define validation rules, build alerting for anomalies and discrepancies, and establish SLAs to ensure stakeholders can trust the data they use.
API Integrations: Integrate third-party and internal REST APIs into data pipelines to pull operational and product data into the warehouse.
Cloud Cost Optimization: Monitor and right-size data processing and storage resources across S3, EMR, Redshift, EC2, and Lambda. Proactively identify inefficiencies and propose cost-saving improvements.
BI & Analytics Collaboration: Partner with the BI team to align data models, preprocessing logic, and Redshift schema design with reporting and dashboard needs.
Requirements
Bachelor’s degree in Computer Science or a related quantitative field.
2+ years of experience working as a Data Engineer
Good proficiency in Python and SQL for data transformation and pipeline development
Hands-on experience with Apache Spark (PySpark) for large-scale data processing
Working knowledge of Kafka for real-time data ingestion and stream processing
Hands-on experience managing and maintaining Airflow DAGs in production environments
Familiarity with Redshift performance tuning, schema design, and query optimization
Experience implementing automated data validation and quality checks within pipelines
Detail-oriented with a keen interest in data transformations and their impact on business outcomes
Problem-solving and time management skills
Prior experience in project or team management is preferred, enthusiasm for mentoring and guiding others is a plus.
Tech Stack
Airflow
Amazon Redshift
Apache
AWS
Cloud
EC2
ETL
Kafka
PySpark
Python
Spark
SQL
Benefits
Professional growth in a dynamic, rapidly expanding, high-social-impact industry
An open-minded, collaborative culture made up of enthusiastic colleagues who are driven by the challenge of innovation towards profound impact on people and the planet.
A truly multicultural experience: you will have the chance to work with and learn from people from different geographies, nationalities, and backgrounds.
Structured, tailored learning and development programs that help you become a better leader, manager, and professional through the Sun King Center for Leadership.