Harnham is seeking a strong Senior Data Engineer to take ownership of their data infrastructure and backend services. The role involves designing, building, and operating data services and pipelines, while collaborating with cross-functional teams to ensure reliable and performant data solutions.
Responsibilities:
- Build and operate production data services and APIs on AWS, deploying and managing containerized applications on Kubernetes (EKS) with full ownership of reliability and performance
- Implement and scale vector search infrastructure using Databricks Vector Index and Milvus to power audience matching, similarity retrieval, and AI-driven product features across 50M+ records and growing
- Build and optimize data pipelines and ETL/ELT workflows in Python and SQL, integrating with Databricks and Snowflake where needed to support model-serving and feature delivery
- Architect scalable, cost-effective cloud infrastructure on AWS (EKS, S3, RDS, Lambda, SQS/SNS) that supports real-time and batch workloads for campaign data, audience signals, and embedding generation
- Serve as the data infrastructure and platform expert across data science, product, and client teams—translating product requirements into reliable, performant data services and pipelines
- Own service reliability, monitoring, and incident response; surface infrastructure gaps and performance bottlenecks to inform the platform roadmap
- Contribute to internal tooling, observability frameworks (logging, metrics, alerting), and engineering best practices across the team
Requirements:
- 5+ years in data engineering, backend engineering, or platform infrastructure roles. Bachelor's degree required
- Strong hands-on experience with AWS (EKS, S3, RDS, Lambda, IAM, CloudFormation or Terraform) and deploying, scaling, and troubleshooting containerized applications on Kubernetes
- Ability to write production-grade, testable, and well-documented backend service code
- Strong proficiency in Python, SQL, Spark, and PySpark, with hands-on Databricks experience (Delta Lake, Jobs, Workflows) and practical experience building data services, working with vector databases (Milvus, Databricks Vector Index), and operating high-throughput systems on AWS
- Experience with CI/CD workflows (GitHub Actions), Docker, Helm, and infrastructure-as-code (Terraform or CloudFormation)
- Excellent communication skills; ability to translate technical findings for non-technical stakeholders
- Experience with embedding generation, ML model serving, or building feature stores for production ML workloads
- Working familiarity with Snowflake or dbt for analytics and transformation workloads
- Experience and familiarity with adtech and digital advertising
- Experience with event-driven architectures, streaming systems (Kafka, Kinesis), or real-time data processing at scale