Delos Data is a stealth-mode startup building foundational technology for AI data center clusters. They are seeking a Software Development Engineer in Test (SDET) to ensure the reliability and performance of their core AI infrastructure by developing automation frameworks and conducting stress tests.
Responsibilities:
- Design, develop, and maintain a robust automated testing framework from the ground up that supports distributed AI training and inference workloads
- Develop complex test plans that go beyond unit tests, focusing on end-to-end system integration, stress testing, and hardware-software boundary conditions
- Partner closely with System Engineers to debug deep-seated issues in distributed clusters, using telemetry and profiling tools to identify bottlenecks
Requirements:
- Strong proficiency in Python (for automation and orchestration)
- Proven experience building or extending test automation frameworks for complex back-end systems
- Proven ability to troubleshoot automated test failures within complex large-scale distributed systems, and identify root causes
- Experience with containerization (Docker/Kubernetes) and modern CI/CD tools (GitHub Actions, GitLab CI, or Jenkins)
- Bachelor's or Master's degree in Computer Engineering, Computer Science, or a related field
- Experience with Kubernetes or Terraform or Ansible for managing test-bed environments
- Experience testing high-performance networking protocols or distributed file systems
- Experience testing software that interacts directly with drivers or firmware