Build and manage reliable data pipelines involving ingestion/collection, processing, integration, storage, and provisioning of data across the organization.
Work within a distributed systems architecture for parallel massive data processing (MPP), combining multiple heterogeneous data sources and collaborating with analytics and data science teams to build solutions and generate data-driven value.
Requirements
Hands-on experience with ingestion, integration, processing, and storage of large volumes of data;
Experience working on Big Data projects;
Behavior Driven Development (BDD);
Data extraction in Python and data processing with PySpark;
Experience with ETL tools;
Knowledge of relational and dimensional data modeling (Data Warehouse);
Experience with SQL databases;
Experience with AWS Big Data-related tools such as EMR, Kinesis, Redshift, S3, Glue, Elasticsearch;
Knowledge of Kafka;
Familiarity with Data Lake and DataOps;
Preferred/Differential: AWS certifications;
Experience with infrastructure-as-code tools for cloud provisioning such as Terraform and CloudFormation.
Tech Stack
Amazon Redshift
AWS
Cloud
ElasticSearch
ETL
Kafka
PySpark
Python
SQL
Terraform
Benefits
Flexible Swile card to use as you wish (meal and grocery allowances