About this roleMeta is seeking a forward thinking engineer to join the Production Operations team within our Data Centers. These Data Centers are the foundation upon which our rapidly scaling infrastructure efficiently operates and upon which our innovative services are delivered. Meta is at the leading edge of the global data center industry both in terms of how data centers are designed and operated. This person should enjoy working in a fast paced, technical environment where adaptability and flexibility will be key to their success.
We seek an IT professional with technical skills in server hardware and Linux - ideally in a Data Center environment. The candidate should have knowledge or experience in a few of the following core areas: Hardware repair, OS management, Tooling and Automation, Networking, or Technical Project Management.
Responsibilities
Support platform health by successfully resolving and closing tickets, including but not limited to remote troubleshooting and physical inspection of hardware in data halls.
* Troubleshoot and diagnose technical issues within the data center, including automated tooling, hardware failures, and network issues.
* Perform repairs where applicable, and consistently resolve problems.
* Build cross functional relationships and influence policies and procedures that improve data center operations.
* Support the introduction of new platforms and hardware to the site.
* Continuously evaluate and identify areas for improvement in processes, tools, and systems to optimize efficiency and quality of repairs.
* Use dashboards and metrics to drive maximum server up-time, utilization rates, understand hardware failure rates and service level agreements.
* Collaborate with team members to evaluate and identify better ways to resolve issues and define updates to tools and processes.
* Provide engineering support and be a technical resource for peers and partner teams.
* Maintain and update documentation i.e. procedures, runbooks and guides.
* Participate in 24/7 on-call rotation.
* Ability to travel up to 15% of the time.
* Required to work a shifted schedule (includes nights and weekends).
Qualifications
Must obtain work authorization in the country of employment at the time of hire and maintain ongoing work authorization during employment
* Currently has, or is in the process of obtaining, a Bachelor's or Master's degree in technical field, or equivalent experience/certification
* Ability to communicate effectively, in a clear and concise manner, appropriately tailoring messages to the audience
* Hands-on experience and knowledge of hardware systems and components
* Knowledge of Linux (or equivalent OS), server hardware repairs and networking
* Basic understanding of data center mechanical & electrical infrastructure Experience in debugging, modifying and developing in commonly used scripting or programming languages in at least one of these languages: Bash, PHP, Python, SQL, Rust, Go or Perl
* Time and project management experience
* Working knowledge of storage platforms such as NAS, SAN, or interconnected server hardware
* Ability to debug, troubleshoot issues in a Linux server environment
* Basic-level knowledge of the interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, and network
* Knowledge of out-of-band/lights-out server communication methods, such as IPMI and serial console
* Working conceptual knowledge of technologies such as HTTP, DNS, RAID, and DHCP