Build and manage reliable data pipelines covering ingestion/collection, processing, integration, storage, and data availability across the organization
Work within a distributed systems architecture for parallel massive data processing (MPP), combining diverse heterogeneous data sources and collaborating with analytics and data science teams to build solutions and deliver data-driven value
Act as a data analyst/engineer to support the client with data engineering and validation in a Big Data environment
Work on both the development of data loading workflows and the validation of business rules and data quality
Manage and organize large-scale data
Collect, transform, store, distribute, and ensure the availability of data
Perform data cleansing and maintain data quality
Requirements
Hands-on experience with ingestion, integration, processing, and storage of large volumes of data
Experience working on Big Data projects
Experience with ETL tools
Knowledge of relational and dimensional data modeling (Data Warehouse)
Understanding of the Lambda architecture
Experience with relational and non-relational databases
Experience with AWS Big Data toolset such as Amazon EMR, Amazon Kinesis, Amazon Redshift, Amazon S3, AWS Glue, Amazon Athena, and Elasticsearch
Experience with Hadoop ecosystem technologies such as HDFS, HBase, MapReduce, Spark, and Hive
Proficiency in Python, Java, and/or Scala
Knowledge of infrastructure-as-code provisioning tools such as Terraform and CloudFormation
AWS certifications are a plus
Specialization in Business Intelligence, Big Data, or Distributed Software Architecture is desirable
Tech Stack
Amazon Redshift
AWS
Cloud
ElasticSearch
ETL
Hadoop
HBase
HDFS
Java
MapReduce
Python
Scala
Spark
Terraform
Benefits
Swile flexible card to use as you wish (meal and food allowance)