DigitalOcean is a cutting-edge technology company focused on simplifying cloud infrastructure. They are seeking a Hardware Sustaining Engineer to support the operational lifecycle of their server hardware and collaborate with various teams to resolve hardware and firmware issues.
Responsibilities:
- Act a member of the Sustaining Engineering team in the Infra::Machines::Design Organization
- Support server hardware, cabling, and networking hardware throughout its operational lifecycle
- Monitor the #machines channel and MACHINES JIRA project for issues and drive them to resolution
- Participate in 24/7 on-call rotation with other members of the team
- Act as Tier 2 escalation for Datacenter Operations (DCOPS) and Cloud Operations (CloudOps) regarding hardware and firmware components
- Develop and maintain standards and practices for DigitalOcean hardware operations
- Work closely with the Qualification team, Firmware team, Fleet Lifecycle Engineering team (FLE), Foresight team, and Infrastructure Services team to resolve issues in tooling, firmware packages, hardware components, and other operational concerns
- Help with development of tooling and associated runbooks to address gaps in operational capabilities around hardware and firmware operations
- Coordinate with Ops teams on monitoring thresholds, failure modes and alerting
- Assist in troubleshooting cause of failures and work to prevent them in the future
- Raise the quality bar in the delivery of our cloud infrastructure by identifying industry best practices and working to adopt them
Requirements:
- Technical Degree (BS Computer Science/Engineering) or equivalent practical experience
- Hands-on experience operating a cloud infrastructure at mid-tier scale or better
- An in-depth understanding of server hardware, firmware, and infrastructure
- Strong knowledge in troubleshooting techniques, Python and BASH
- Clear communication and collaboration across key stakeholders
- An insatiable passion for constant improvement