DigitalOcean is a cloud infrastructure provider seeking an entry-level Systems Engineer I to optimize and troubleshoot data center hardware. The role involves qualifying and deploying firmware packages across DigitalOcean products and components, working closely with vendors and internal teams to enhance operational capabilities.
Responsibilities:
- Work with vendors and internal peer teams on qualifying, onboarding, and delivering new firmware to the DigitalOcean ecosystem
- Act as Tier 3 escalation on-call for triage, investigation, and resolution of system firmware issues in the DigitalOcean fleet (both customer-facing and internal)
- Participate in 24/7 on-call rotation with other members of the team
- Improve existing firmware and hardware configuration automation/validation, for both hardware platforms and components (such as NIC, Storage and BMC)
- Engage with hardware vendors about new automation features and existing bugs
- Help with development of tooling and associated runbooks to address gaps in operational capabilities around hardware and firmware operations
- Coordinate with Ops teams on monitoring thresholds, failure modes and alerting
- Assist in troubleshooting causes of failures and work to prevent them in the future
- Raise the quality bar in the delivery of our cloud infrastructure by identifying industry best practices and working to adopt them
Requirements:
- Technical Degree (BS Computer Science/Engineering) or equivalent practical experience
- Strong understanding of x86 server hardware architecture and subsystems
- Demonstrated professional proficiency in configuration management best-practices (we use Ansible and Chef)
- Experience automating server firmware components at large-scale using industry-standard tooling (Redfish, IPMI, etc) including a deep understanding of benchmarking, automating test frameworks, and process automation in general
- Practical knowledge of PXE boot, UEFI, Linux/OS boot, AMI/OEM BIOS distributions, OpenBMC/AMI/OEM BMC implementations, RAID and other storage resiliency technologies, and the full Network stack- from NIC firmware to TCP/IP
- Adept at Linux (or Unix) operating systems
- Comfortable with version control systems (we use Git) and proficient in at least one programming language (such as Python or Go)
- Ability to participate in 24/7 on-call rotation with other members of the team
- Excellent communication skills, both within the team and with the broader company
- Have an insatiable passion for hardware, both new and old
- Ideally, you've worked with non-x86 hardware too!