Harrison Clarke is a high-growth AI startup seeking AI Infrastructure Engineers to design and scale the cloud-native foundation powering its AI platform. This role focuses on building robust, cost-efficient infrastructure to support large-scale, real-time AI workloads, while owning the end-to-end infrastructure stack.
Responsibilities:
- Architect and scale infrastructure for low-latency, high-throughput AI inference
- Orchestrate GPUs and manage multi-tenant workloads using Kubernetes and service mesh technologies
- Build and operate core systems including infrastructure-as-code, observability, distributed storage, and networking
- Implement cross-platform capabilities such as authentication, rate limiting, monitoring, and telemetry
- Define infrastructure roadmap and drive trade-offs across performance, reliability, and cost
- Collaborate closely with ML engineers to productionize and optimize model serving pipelines
Requirements:
- Strong background in infrastructure engineering, DevOps, or ML platform engineering
- Deep expertise in Kubernetes at scale, GPU orchestration, and cloud-native automation
- Experience designing highly available, globally distributed systems and traffic routing
- Proficiency with infrastructure-as-code, CI/CD pipelines, and observability tooling
- Solid understanding of distributed systems, performance optimization, and cost management
- Experience with ML inference frameworks (e.g., Triton, ONNX Runtime, vLLM, TensorRT)
- Strong grasp of cloud security and data management for production AI workloads
- Entrepreneurial mindset with the ability to operate autonomously in fast-paced environments