Harnham is a mission-driven AI research organization operating at the cutting edge of large-scale data infrastructure and machine learning. As a Senior Infrastructure Engineer, you'll be responsible for designing and optimizing core data pipelines and backend systems that power advanced AI research at scale.

Responsibilities:

Design and optimize high-performance data pipelines for distributed training and large-scale storage using tools such as Arrow, DuckDB, LanceDB, BigQuery, and vector databases
Drive low-level performance optimization across the stack — latency, throughput, GPU utilization, and reliability
Build and maintain monitoring and observability tooling for data quality, pipeline performance, and experiment tracking
Optimize distributed AI workloads for efficiency and scale across cloud infrastructure (GCP-primary)
Architect public-facing data infrastructure capable of serving large, heterogeneous, multimodal datasets to a global research community
Scope and supervise projects so that interns, PhD students, and post-docs can contribute effectively
Set engineering standards and best practices across the infrastructure function
Support technical hiring and help shape the growth of the engineering team
Act as a bridge between research and engineering — translating prototype workflows into production-grade systems

Requirements:

5+ years of backend or infrastructure engineering experience
Strong Python programming skills (Go is a strong plus; C++, Rust, or CUDA is a bonus)
Proven experience building and supporting ML/AI infrastructure in production environments
Hands-on experience with containerization and IaC — Docker, Kubernetes, Terraform
Experience with cloud platforms (GCP preferred, AWS or Azure also considered)
Proficiency with high-performance data tools such as DuckDB, Apache Spark, or Delta Lake
Experience with distributed systems and large-scale data storage
A backend-first, performance-obsessed mindset
Experience mentoring junior engineers or researchers and breaking down complex technical problems
GPU orchestration and large-scale model training experience
HPC infrastructure experience (Slurm, K8s clusters)
Familiarity with ML platforms (Vertex AI, SageMaker) and frameworks (PyTorch, JAX)
Monitoring stack experience (Prometheus, Grafana)
Background in multimodal, audio, or large-scale scientific data
Full-stack exposure (React or similar) sufficient to guide others

Senior Infrastructure Engineer (Backend/Data Performance)

Key skills

About this role

Responsibilities:

Requirements: