Harnham is an early-age cutting-edge organization seeking a Machine Learning Engineer to drive infrastructure scalability across state-of-the-art GenAI products. The role involves building and scaling ML infrastructure, optimizing models for performance, and architecting deployment strategies to support growth and complexity.
Responsibilities:
- Build and scale ML infrastructure capable of serving high-volume, low-latency model inference
- Optimize models and pipelines for performance, cost, and reliability in production environments
- Productionize research and experimental models into scalable, maintainable ML systems
- Architect infrastructure and deployment strategies that support continuous growth and evolving model complexity
- Drive infrastructure development to handle a variety of models
Requirements:
- Experience in ML platform infrastructure and deployment, including scaling training / inference, concurrency, queuing, back pressure, orchestration
- Design and operate high-performance model serving systems with proven ownership of system stability, not just in deployment
- Engineer solutions that efficiently manage parallel inference workloads at scale
- Tune end-to-end serving pipelines to maximize responsiveness and overall system capacity
- Python
- AWS native stack
- Docker, containers, SageMaker, Kubernetes