ALL KNOWN SERVICES is seeking a Senior Infrastructure Engineer to operate and maintain physical infrastructure for private cloud platforms. The role involves managing datacenter operations, troubleshooting hardware issues, and collaborating with various engineering teams.
Responsibilities:
- Operate and maintain physical infrastructure supporting private cloud platforms
- Perform day-2 datacenter operations, including installation, firmware upgrades, patching, and hardware lifecycle management
- Manage capacity planning, refresh cycles, and infrastructure expansion
- Troubleshoot complex issues across server hardware, storage systems, and network connectivity
- Manage hardware controllers such as iDRAC, iLO, and Redfish
- Collaborate with platform engineering, SRE, networking, and security teams
- Improve system reliability through automation, monitoring, and operational standardization
- Develop automation tools for infrastructure provisioning using Ansible and CI/CD pipelines
- Lead incident response, root cause analysis, and corrective actions
- Maintain documentation, runbooks, and operational playbooks
- Mentor junior engineers and help mature operational processes
Requirements:
- 7+ years of experience in infrastructure or datacenter engineering
- Strong hands-on experience with server, storage, and networking hardware at scale
- Deep understanding of x86 architecture, CPU topology, memory configurations, NUMA, and I/O subsystems
- Experience operating enterprise storage systems and understanding performance and failure modes
- Solid understanding of data center networking (L2/L3, VLANs, bonding)
- Strong Linux system administration and OS-level troubleshooting skills
- Experience with hardware lifecycle management, vendor coordination, and capacity planning
- Proven ability to automate infrastructure operations
- Experience supporting private cloud or on-prem cloud environments
- Experience with OME, OneView, or Cisco Intersight
- Familiarity with bare-metal provisioning frameworks
- Knowledge of Redfish, IPMI, or modern hardware APIs
- Experience operating multi-datacenter infrastructure environments