Initialized Capital is focused on building quantum-accelerated AI servers to enhance AI training and inference. The ML Infrastructure Engineer will be responsible for building and managing the compute platform for the AI & Algorithms team, ensuring reliable GPU access and scalable workloads across various cloud providers and on-premise servers.

Responsibilities:

Build compute abstractions that handle the team's diverse workloads: GPU-accelerated simulation, distributed training, high-throughput CPU jobs, and interactive analysis -- across PyTorch, JAX, and scientific computing frameworks
Stand up experiment tracking and reproducibility infrastructure
Create developer tooling that makes cloud compute feel local: environment setup, job submission, monitoring, and artifact management
Scale experiments from single-GPU prototyping to multi-node production runs
Design multi-provider workload orchestration: route jobs based on cost, availability, and capability
Manage and optimize spend across cloud providers -- track credit balances, burn rates, and expiration dates
Configure hybrid local + cloud workflows as on-prem GPU infrastructure comes online
Coordinate with our infrastructure engineer on cloud administration and security
Build CI/CD pipelines for research workloads: automated testing, evaluation benchmarks, artifact management
Create data generation and preprocessing pipelines at the throughput the team's simulators demand
Set up monitoring, alerting, and cost dashboards that surface problems before researchers hit them

ML Infrastructure Engineer

Key skills

About this role

Responsibilities: