BMC Helix is an innovative company focused on AI-driven IT solutions. They are seeking a Principal DevOps Engineer/Architect to architect, maintain, and support cloud deployments, driving architectural improvements and troubleshooting complex systems.
Responsibilities:
- Participate in the design, troubleshooting, and support of private and public cloud deployments across AWS, Azure, IBM, and other IaaS platforms
- Evaluate, recommend, and troubleshoot platform deployment software, including Kubernetes and Rancher environments
- Develop and maintain automation using Terraform and Ansible
- Architect solutions for both bare-metal and virtualized environments, including KubeVirt
- Lead capacity and performance management initiatives, driving data center rollouts and consolidation efforts
- Create and document Standard Operating Procedures (SOPs), design documents, and architectural artifacts, while educating teams on the implementation of new cloud-based initiatives
- Demonstrate deep expertise in troubleshooting complex distributed systems, with hands-on experience resolving the most challenging issues
- Lead troubleshooting efforts from application-level problems down to network packet flows
- Conduct root cause analysis and system-wide diagnostics with a holistic view of the platform
- Serve as a technical mentor and thought leader within the organization
- Collaborate with cross-functional teams to ensure architectural alignment and operational excellence
- Operate effectively in ambiguous environments, taking initiative to drive clarity and progress
Requirements:
- 15–20+ years of experience in the IT industry, including over 10 years specializing in DevOps and SRE practices, technologies, and industry standards to ensure production environments are reliable, resilient, and scalable
- Bachelor's degree in a related field
- Advanced development skills, with the ability to quickly produce code in languages suited to the task—including Python and shell scripting—and select the best language for the job
- Extensive expertise with infrastructure components such as Elastic, Redis, Kafka, TLS, TCP/IP, encryption, Linux internals, packet tracing, Jenkins, CI/CD, and GitOps pipeline infrastructure
- Comprehensive knowledge and hands-on mastery of Kubernetes, Ansible, and Terraform, including container deployments, persistent storage, pods, networking, ingress, routes, and Kubernetes objects
- Significant experience deploying solutions in AWS, Azure, IBM, and private cloud environments
- In-depth knowledge of virtualization, SAN, and other storage architectures, as well as security practices
- Proven track record working with Change and Incident Management tools
- Ability to work independently with minimal supervision, consistently demonstrating initiative and ownership of outcomes
- Success in solving problems of diverse scope, requiring thorough evaluation and innovative approaches
- Sound judgment in selecting methods and techniques to achieve effective solutions
- Demonstrated ability to identify and resolve a wide range of issues in both practical and imaginative ways
- Skill in assessing and communicating risks based on complexity, resources, and timeline constraints
- Ability to work effectively in cross-functional teams and influence outcomes
- Excellent verbal and written communication skills for effective interaction with a global team
- Candidates must have US citizenship and the ability to obtain a security clearance
- Certifications such as ITIL, MCSE, VCP, AWS, GCP, or OCI are a plus
- Current security clearance strongly preferred