Lead the data architecture, design, and deployment of scalable, high-throughput Big Data systems into production environments.
Architect, deploy, and manage the foundational data systems that underlie modern AI infrastructure, including vector, NoSQL, and document databases.
Develop end-to-end data engineering solutions, including robust ETL/ELT pipelines, API services, and data ingestion frameworks.
Design and build the storage and processing layers powering our analytics workloads: data lakes, data warehouses, distributed file systems, and real-time streaming architectures.
Engineer feature-rich context pipelines that process large-scale enterprise data, balancing batch and streaming patterns seamlessly.
Optimize and scale large distributed queries and data transformations to ensure high performance and low latency for end users.
Implement data quality frameworks to measure and ensure data integrity, reliability, and governance across all data assets.
Collaborate with analytics, product, and platform teams to build data models that capture the semantics of customer metrics, hierarchies, and relationships.
Stay current with the modern data stack and big data landscape, evaluating new tools, distributed computing frameworks, and database technologies for potential adoption.
Requirements
7+ years of dedicated data engineering experience, demonstrating a strong track record of hands-on execution and delivery in complex data environments.
Deep practical understanding of the database ecosystems that power AI and machine learning infrastructure (e.g., Vector databases, NoSQL, and Document stores).
Hands-on experience building, scaling, and shipping large-scale data platforms in production.
Deep practical experience with distributed data processing frameworks (e.g., Apache Spark, Flink, Hadoop).
Strong expertise in message brokers and event streaming platforms (e.g., Apache Kafka, Kinesis).
End-to-end exposure to data pipeline lifecycle development, including extensive experience with workflow orchestration tools (e.g., Apache Airflow, Dagster).
Hands-on expertise with cloud data warehouses (e.g., Snowflake, BigQuery, Redshift) and data lake architectures (e.g., Databricks, Delta Lake, Apache Iceberg).
Advanced SQL skills and proficiency in Python, Scala, or Java.
Strong background in modern software development practices (testing, code review, CI/CD, Infrastructure as Code).
Tech Stack
Airflow
Amazon Redshift
Apache
BigQuery
Cloud
ETL
Hadoop
Java
Kafka
NoSQL
Python
Scala
Spark
SQL
Benefits
Our Winning Culture is the engine that drives our teams of innovators.
We champion diversity of thought and ideas.
We behave like leaders regardless of title.
We are committed to achieving ambitious goals.
We love celebrating our wins – big and small.
DEIB improves our workforce, enhances trust with our partners and customers, and drives business success.
We hire you for who you are, and we want you to bring your authentic self to work every day!