Skild AI is building the world's first general purpose robotic intelligence that adapts to unseen scenarios. They are seeking a Senior Software Engineer to build and scale training infrastructure and tools for machine learning applications in robotics.
Responsibilities:
- Architecting, building, and maintaining distributed training pipelines and frameworks spanning data ingest/preprocessing, large-scale training, and evaluation
- Optimizing training performance and resource utilization by identifying bottlenecks and implementing improvements in data loading, I/O, caching, sharding, and prefetching
- Integrating state-of-the-art ML techniques into production training systems in collaboration with research/ML teams
- Implementing monitoring, logging, alerting, automated testing, and CI/CD for reliable training operations
- Developing developer tooling and documentation, including dashboards and utilities, to streamline experimentation at scale and improve engineer productivity