Role Overview
- Architect and Evolve the Private Cloud: Own the end-to-end architecture of our OpenStack-based private cloud (Nova, Neutron, Cinder, Glance, Keystone, Placement, Octavia, Designate) across multiple regions and data centers.
- Lead Ceph at Scale: Design, deploy, and operate large Ceph clusters, tiering, replication strategy, performance tuning, and lifecycle upgrades.
- Operate Bare Metal as a Service: Architect and operate an Ironic-based BMaaS platform integrated with OpenStack and/or Kubernetes, covering hardware enrollment, inspection, image deployment, network provisioning, and lifecycle management at fleet scale.
- Data Center Expansion: Lead the technical design of new data center buildouts, and bare-metal automation, defining reference architectures, hardware standards, and capacity models.
- Infrastructure as Code: Set the standards for IaC across the platform using Terraform, Ansible, Packer, and configuration management — including OpenStack lifecycle tooling (e.g. Kolla-Ansible, OpenStack-Ansible, Charmed OpenStack, etc).
- Networking Architecture: Define the SDN strategy (OVN/OVS), tenant networking, provider networks, BGP/EVPN integration with the underlay, and high-performance datapaths (SR-IOV, DPDK) for latency
- and throughput-sensitive workloads.
- Kubernetes on Bare Metal: Architect Kubernetes platforms running on Ironic-provisioned hardware and OpenStack, including service mesh, CNI selection, and integration with Ceph CSI for persistent storage.
- Performance, Capacity, and Cost: Drive capacity planning, performance benchmarking, and TCO modeling for compute, storage, and bare-metal fleets — ensuring efficient resource utilization and predictable scaling.
- Reliability and Operability: Partner with the SRE team to define SLIs/SLOs for platform services, design failure domains, validate disaster recovery, and ensure operational maturity of every new component introduced.
- Security and Compliance: Work closely with Security to harden the platform end-to-end
- secure boot, image signing, IAM (Keystone federation, OIDC, LDAP/AD), tenant isolation, and compliance with CIS, ISO 27001, and GDPR.
- Technical Leadership: Mentor senior and mid-level engineers, lead design reviews, write RFCs and architecture decision records, and represent Infrastructure in cross-functional technical forums.
- Support and Troubleshooting: Act as the deepest escalation point for the platform; participate in on-call for critical incidents, lead complex root-cause investigations across compute, storage, and network layers.
Requirements
- Experience: Minimum 8 years in infrastructure engineering, with a track record of architecting and operating large-scale production platforms.
- OpenStack Expertise: Deep, hands-on experience designing, deploying, and operating production OpenStack clouds — including upgrades, HA control plane design, multi-region topologies, and at least one major lifecycle tooling stack (Kolla-Ansible, OpenStack-Ansible, or equivalent).
- Ceph Expertise: Proven experience operating Ceph clusters in production, including CRUSH design, performance tuning, troubleshooting OSDs/MONs/MDS, RGW at scale, and Ceph version upgrades.
- Ironic / BMaaS: Hands-on experience with Ironic for bare-metal provisioning at fleet scale, including driver selection (IPMI, Redfish), inspection, cleaning, deployment workflows, and integration with Nova or standalone BMaaS use cases.
- Linux Expertise: Expert-level Linux systems engineering on Red Hat / Rocky Linux (or similar), including kernel tuning, networking stack, storage subsystem, and performance analysis.
- Networking: Deep understanding of high-performance and SDN networking — OVN/OVS, BGP/EVPN, VXLAN, SR-IOV, DPDK, NFV, and L2/L3 protocols (TCP/IP, BGP, OSPF, DNS).
- Kubernetes & Containerization: Strong experience with Kubernetes, Helm, and service mesh technologies (e.g., Istio), including running Kubernetes on bare metal and on OpenStack.
- Infrastructure as Code: Expert in Terraform, Ansible, and Packer; experience with at least one configuration management system (Puppet, Salt, or equivalent).
- System Architecture: Strong knowledge of hardware (CPU, memory, NUMA, NICs, RAID, HBA, NVMe, iSCSI, Fiber Channel, Ethernet) and the ability to translate hardware characteristics into platform design decisions.
- Security & Compliance: Experience with system hardening, compliance frameworks (CIS, GDPR, ISO 27001), and IAM systems (Keystone federation, OpenLDAP, Active Directory, PAM).
- Monitoring & Observability: Expert in Prometheus, Grafana, Alertmanager, and Loki; experience in instrumenting OpenStack and Ceph specifically.
- Scripting & Programming: Proficient in Python and Go for platform-level automation and tooling; strong Bash for operational work.
- Storage Technologies: Deep knowledge of SDS, distributed storage trade-offs, and data protection strategies (backup, snapshots, disaster recovery, geo-replication).
- Technical Leadership: Demonstrated ability to lead architecture decisions, mentor senior engineers, and drive consensus across teams without direct management authority.
- Problem-Solving: Strong problem-solving mindset and ability to lead complex investigations under pressure.
Tech Stack
- Ansible
- Cloud
- DNS
- Grafana
- Kubernetes
- Linux
- OpenStack
- Packer
- Prometheus
- Puppet
- Python
- SaltStack
- TCP/IP
- Terraform
- Go
Benefits
- Growth Opportunities: Advance your career in one of the fastest growing telecommunications companies, expanding over 100% year-on-year under the leadership of successful tech entrepreneurs.
- Major Transaction Exposure: Be in the driver’s seat for transactions that will have an impact on the future telco industry.
- Work with a Talented Team: From the Board and the Founders to the Senior Management Team, you will collaborate daily with the most capable and renowned external advisors, and constantly being exposed to talented and driven individuals.
- Dynamic Work Environment: Thrive in a collaborative, fast-paced workplace where innovation is encouraged, and every contribution counts.
- Professional Development: Work alongside industry experts to enhance your skills and knowledge in a cutting-edge field.
- International Experience: Gain opportunities to work in different 1GLOBAL offices around the world as you grow within the company.
- Open Communication Culture: Join a team where your ideas are heard, and open dialogue is encouraged, fostering a supportive and transparent work environment.
- Get Things Done Attitude: Be part of a results-driven team that values efficiency, creativity, and the drive to make a tangible impact in the industry.
**1GLOBAL is an equal opportunity employer, we value your character as much as your talent. Diversity drives our innovation, and we offer a collaborative, dynamic, and international work environment. We are excited for you to join our mission to revolutionise connectivity globally.