Coalfire is on a mission to make the world a safer place by solving clients’ hardest cybersecurity challenges. They are seeking a Senior Site Reliability Engineer to lead complex initiatives and enhance managed services capabilities, focusing on engineering solutions for vulnerability management and compliance requirements.
Responsibilities:
- Hands-on engineering work, including developing new deployments, automation scripts, and tooling to meet client deliverables focused on vulnerability management, infrastructure updates, and compliance requirements
- Develop and maintain Infrastructure-as-Code (IaC), utilizing Coalfire standard modules for Terraform, Ansible, and CI/CD pipelines across projects
- Partner with Technical Managers and engagement leads to evaluate risks, prioritize issues, and develop actionable mitigation plans across an SRE team’s portfolio of M&O clients
- Contribute to technical playbooks, standards, and frameworks for operational excellence in managed services delivery
- Own the patch management strategy for assigned environments, ensuring regulatory compliance and timely remediation of vulnerabilities
- Oversee Identity and Access Management (IAM), implementing and enforcing security best practices to protect sensitive data and ensure proper access controls
- Perform cloud administration and system administration tasks, such as provisioning resources, optimizing performance, and troubleshooting infrastructure issues
- Adhere to established quality standards for engineering deliverables, aligning with internal protocols, compliance regulations, and project deadlines
- Identify and communicate potential risks, working with relevant stakeholders to incorporate mitigation strategies that meet regulatory and client expectations
- Contribute to day-to-day agile project management tasks, including tracking progress, providing updates, and ensuring assigned activities are completed on schedule
- Mentor junior engineers, review their work, and help mature engineering practices
Requirements:
- 5–7 years in systems engineering/SRE with increasing responsibility, including architecture design, operations, and automation
- 4+ years in cloud infrastructure management (AWS, Azure, or GCP) with multi-account and multi-environment experience
- 4+ years developing and maintaining IaC with Terraform/Ansible at scale
- Direct experience leading at least 1 operational improvement (e.g., reducing toil, enhancing SLAs, improving incident response)
- Possess AWS Solutions Architect Professional certification
- Demonstrated experience driving at least 1 successful team initiative and serving as the technical SME for a complex initiative
- Advanced Cloud Expertise: Strong hands-on experience with a major public cloud (AWS, Azure, or GCP), including architecture, security, and performance optimization
- IaC Leadership: Deep understanding of Terraform, and modern CI/CD automation practices; capable of reviewing and improving team IaC workflows
- Operational Excellence: Ability to lead troubleshooting of high-impact incidents, implement monitoring solutions, and improve system reliability
- Security-First Mindset: Experienced in aligning engineering solutions with frameworks such as FedRAMP, CIS, and NIST
- Collaboration & Leadership: Proven ability to lead cross-functional projects, mentor team members, and influence stakeholders
- Documentation & Communication: Skilled at creating technical documentation, architecture diagrams, and presenting complex solutions clearly to both technical and non-technical audiences
- Manage and maintain Windows and Linux server environments, including system hardening, GPO configuration, user management, OS-level troubleshooting, and ensuring consistent patching across hybrid environments
- US citizenship (required due to client contractual requirements)
- Advanced or specialty cloud certifications (for example, AWS DevOps Engineer)
- CISSP (Certified Information Systems Security Professional) or comparable cybersecurity certification
- Modern Architecture Expertise: Serverless, containers (Docker/Kubernetes), microservices, and event-driven systems
- Tools: Familiarity with Visio, LucidChart, Jira, or similar platforms for diagramming and project coordination
- Compliance-Driven Engineering: Prior work in regulated industries (FedRAMP, HIPAA, PCI)
- Exposure to large-scale or high-availability production environments (24x7)
- Familiarity with encryption, PKI, and security baselines (for example, FIPS 140-2, CIS Benchmarks, DISA STIG)
- Previous experience in technical consulting engagements or cross-functional collaboration (for example, with security teams, compliance teams)
- Additional hands-on work with continuous monitoring and vulnerability management