DigitalOcean is a company focused on building scalable cloud solutions. They are seeking a motivated and experienced infrastructure engineer to architect, build, and support their Provisioning Automation system, with a primary mission to deliver GPU systems rapidly.
Responsibilities:
- Developing impactful, new and innovative systems that will help DigitalOcean scale
- Responding to provisioning failures
- Working to ensure that common provisioning failures do not recur (likely via automation)
- Collaborating with sibling teams to deliver on wider organizational goals
- Bringing new and actionable information to light via developing visualization tooling
- Having fun with an amazing and welcoming team 🙂
- Developed a “provisioning-specific view” in our visualization interface
- Architected the MVP of an automated provisioning system
- Manually provisioned 300+ systems in order to meet aggressive deadlines (we’re not above manual work to hit our goals and feel the pain of our customers!)
- Developed firmware alerts for hardware system firmware being out-of-date
Requirements:
- Programming Languages: python, golang, shell
- Systems: Linux, Containers, StackStorm, Ansible
- Theory: Distributed Systems, Complex System Failure, Resilient Architecture, Quality Engineering
- Significant experience administering Linux servers
- Strong experience with Python, Ruby, or Golang
- Familiarity with git
- Familiarity with shell scripting
- Familiarity with continuous integration systems and concepts
- Excellent written and verbal English communication skills
- Comfort executing in an asynchronous remote environment
- Transparency, honesty, and openness to constructive feedback
- A desire to work with a respectful and inclusive team
- StackStorm experience
- Familiarity with Github Actions