Lead the architectural design and technical roadmap for scalable, high-performance data processing pipelines capable of handling petabyte-scale telemetry data (logs, metrics, traces).
Drive the development and optimization of ML-driven data routing and transformation engines to reduce customer data volumes by 80%+ while architecting real-time analytics systems using advanced machine learning and LLMs.
Design cloud-native microservices and APIs that integrate seamlessly with major observability platforms (Splunk, Elastic, Datadog) while establishing robust monitoring, alerting, and observability solutions.
Lead cross-functional technical initiatives and decision-making forums, collaborating with Product, Data Science, and DevOps teams to translate strategic vision into technical solutions and company-wide standards.
Provide technical leadership and mentorship to senior and junior engineers, driving system performance and reliability optimization while establishing engineering best practices and culture.
Requirements
5+ years of software engineering experience focused on distributed systems, data engineering, or ML infrastructure with expert-level proficiency in Go, Rust, or Java.
Extensive experience with cloud platforms (AWS, GCP, Azure), container orchestration (Kubernetes), and a proven track record leading and scaling data pipelines using Kafka, Spark, or Flink.
Deep expertise in database technologies (SQL and NoSQL) and advanced experience with machine learning frameworks (TensorFlow, PyTorch) and MLOps practices for production ML systems.
Expert knowledge of observability tools and standards, including Prometheus, Grafana, ELK stack, OpenTelemetry, and Parquet, alongside extensive experience with Infrastructure as Code (Terraform).
Strong leadership and technical communication skills, with a track record of mentoring engineers, planning technical strategy, and driving decisions across multiple teams and stakeholders.
Tech Stack
AWS
Azure
Cloud
Distributed Systems
Google Cloud Platform
Grafana
Java
Kafka
Kubernetes
Microservices
NoSQL
Prometheus
PyTorch
Rust
Spark
Splunk
SQL
Tensorflow
Terraform
Go
Benefits
Medical, Vision, Dental, 401(k), Commuter, Health and Dependent FSA
Unlimited PTO
Industry-leading gender-neutral parental leave
Paid company holidays
Paid sick time
Employee stock purchase program
Disability and life insurance
Employee assistance program
Gym membership reimbursement
Cell phone reimbursement
Numerous company-sponsored events, including regular happy hours and team-building events