Work internal business partners to identify, capture, collect, and format data from the external sources, internal systems and the data warehouse to extract features of interest
Contribute to the evaluation, research, experimentation efforts with batch and streaming data engineering technologies in a lab to keep pace with industry innovation
Work with data engineering related groups to inform on and showcase capabilities of emerging technologies and to enable the adoption of these new technologies and associated techniques
Coordinate with Privacy Compliance to ensure proper data collection and handling
Create and implement business rules and functional enhancements for data schemas and processes
Perform data load monitoring and resolution
Work with internal business clients to problem solve data availability and activation issues
Requirements
0-2 years of hands on experience with data engineering required
Bachelor’s Degree in related field or equivalent experience required
Experience with processing large data sets using Hadoop, HDFS, Spark, Kafka, Flume or similar distributed systems
Experience with ingesting various source data formats such as JSON, Parquet, SequenceFile, Cloud Databases, MQ, Relational Databases such as Oracle
Experience with Cloud technologies (such as Azure, AWS, GCP) and native toolsets such as Azure ARM Templates, Hashicorp Terraform, AWS Cloud Formation
Working knowledge of Object Storage technologies to include but not limited to Data Lake Storage Gen2, S3, Minio, Ceph, ADLS etc
Experience with containerization to include but not limited to Dockers, Kubernetes, Spark on Kubernetes, Spark Operator
Strong background with source control management systems (GIT or Subversion); Build Systems (Maven, Gradle, Webpack); Code Quality (Sonar); Artifact Repository Managers (Artifactory), Continuous Integration/ Continuous Deployment (Azure DevOps)
Experience with NoSQL data stores such as CosmosDB, MongoDB, Cassandra, Redis, Riak or other technologies that embed NoSQL with search such as MarkLogic or Lily Enterprise
Creating and maintaining ETL processes
Experience with Adobe solutions (ideally Adobe Experience Platform, DTM/Launch) and REST APIs
Understanding of cloud solutions such as Google Cloud Platform, Microsoft Azure & Amazon AWS cloud architecture & services
Understanding of GDPR, privacy & security topics.
Tech Stack
AWS
Azure
Cassandra
Cloud
Distributed Systems
ETL
Google Cloud Platform
Gradle
Hadoop
HDFS
Kafka
Kubernetes
Maven
MongoDB
NoSQL
Oracle
Redis
Spark
Subversion
Terraform
Webpack
Benefits
401K matching
bonding leave for new parents (12 weeks, 100% paid)