Build and manage reliable data pipelines involving ingestion/collection, processing, integration, storage, and data availability across the organization
Work within a distributed systems architecture for massively parallel (MPP) data processing, combining diverse heterogeneous data sources and collaborating with analytics and data science teams to build solutions and generate data-driven value
Requirements
Practical experience with ingestion, integration, processing, and storage of large volumes of data
Experience working on Big Data projects
Behavior Driven Development (BDD)
Data extraction using Python and data processing with PySpark
Experience with ETL tools
Knowledge of relational and dimensional data modeling (Data Warehouse)
Experience with SQL databases
Experience with Big Data-related AWS toolset such as EMR, Kinesis, Redshift, S3, Glue, Elasticsearch
Knowledge of Kafka
Knowledge of Data Lake and DataOps
Experience with Data Science
Preferred/Differential: AWS certifications
Knowledge of infrastructure provisioning tools as code for cloud such as Terraform and CloudFormation
Tech Stack
Amazon Redshift
AWS
Cloud
ElasticSearch
ETL
Kafka
PySpark
Python
SQL
Terraform
Benefits
Swile flexible card you can use as you wish (meal and food allowances)