Doghouse Recruitment is a cloud technology company driving the next generation of AI infrastructure. They are seeking an AI Infrastructure Engineer to build automation and lifecycle systems for a global GPU-cluster fleet, working with cutting-edge NVIDIA hardware and designing scalable solutions.
Responsibilities:
- Design and develop backend services and automation tooling in Python
- Build and maintain provisioning, testing, and lifecycle management systems for physical hardware, including software that runs directly on bare-metal environments
- Integrate with Linux systems using shell scripting and low-level tooling, and implement CI/CD pipelines for infrastructure-focused software
- Work across networking layers (IPv4/IPv6, DHCP, DNS, network boot) and interface with hardware management controllers and their protocols
- Design NoSQL data stores for system state and orchestration
- Support ARM64 architectures and contribute clear documentation and operational excellence across large machine fleets
Requirements:
- Strong Python engineering experience with a solid Linux and shell scripting background
- Hands-on familiarity with bare-metal servers, networking fundamentals, and hardware management interfaces and APIs
- Experience with CI/CD pipelines and NoSQL databases
- The ability to debug complex issues spanning software, hardware, and networks
- A strong ownership mindset and clear communication skills in a distributed team
- Experience at large infrastructure scale, with ARM platforms in production, or in hardware testing and factory provisioning
- A background in infrastructure automation, internal platform tooling, or open-source systems software