Consensus Cloud Solutions is a leading digital cloud fax and interoperability solutions organization focused on empowering healthcare providers. The Site Reliability Engineer I will support tooling and technologies that underpin the company’s infrastructure, automating and streamlining the software delivery lifecycle while collaborating with engineering and security teams.
Responsibilities:
- Lead the design, development, and maintenance of secure, scalable, resilient, and cost-effective cloud infrastructure solutions on AWS through a DevOps approach, leveraging the existing IaC framework based on Python, Terraform, and Terragrunt managing AWS resources; championing IaC best practices while ensuring adherence to best practices for security, reliability, performance, cost optimization, and operational excellence
- Design, implement, manage, and optimize robust CI/CD pipelines using tools like GitHub Actions and AWS CodePipeline for both infrastructure and applications; maintain deep expertise in GitHub
- Design, develop, and implement new tooling, applications, and platforms to improve and upgrade the capabilities of the IaC and automation platforms, and support infrastructure
- Provide expert DevOps-focused full-stack guidance and support to software engineering teams (using common languages such as Java, Python, Node, Go, etc.) to integrate DevOps practices, automate builds/deployments, identify/resolve reliability/performance bottlenecks, and establish comprehensive documentation
- Champion and implement DevOps best practices across teams, fostering a culture of collaboration, automation, and continuous improvement, as well as providing mentorship and leadership in DevOps methodologies, IaC, CI/CD, and cloud technologies
- Participate in grooming and prioritizing development efforts in extending and supporting the IaC, tooling, and infrastructure support application platforms
- Partner with other teams across the technology group to propose and draft RFCs and standards for development of best practices and design patterns for applications and platforms using IaC and the established deployment pipelines and tooling
- Research, propose, and implement solutions to improve and upgrade cloud-based resources, infrastructure, and systems, ensuring they are performant, efficient, and resilient
- Initiate efforts to review and ensure existing platforms are performing resiliently, efficiently, and are cost effective
- Monitor ticket queues, provide timely and accurate updates, and resolve feature request and development tickets
- Monitor and respond to requests and questions in Slack channels, providing guidance to and assisting troubleshooting for developers and team members
- Create tickets and participate in deployments following Change Management procedures
- Participate in a 24/7 on-call rotation to respond to and resolve production incidents; lead and contribute to blameless postmortems
- Define and manage Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
- Evaluate, implement, and support Open Source frameworks and projects (e.g., ECS/Docker, Prometheus, Grafana, ELK stack, Kafka)
- Update and maintain documentation for troubleshooting, and Methods of Procedure for deployments
- Ensure systems and users follow security standards and follow established policy, and support audit processes
- Light travel to summit meetings or conferences may be required
- Perform other duties and responsibilities as required, assigned, or requested. Consensus reserves the right to add or change duties at any time
Requirements:
- A security clearance or the ability to obtain a security clearance is required
- 6+ years hands-on experience managing and automating UNIX/Linux system environments within a DevOps context
- 5+ years of experience in a DevOps Engineer or SRE role with a strong DevOps focus, emphasizing infrastructure automation, CI/CD pipeline development, and cloud services
- 4+ years experience designing and implementing infrastructure as code within the AWS ecosphere using Terraform
- Mastery in DevOps discipline and processes, including building and managing CI/CD pipelines supporting Infrastructure as Code frameworks such as Terraform, and Continuous Delivery Tools such as AWS CodePipeline, GitHub Actions, Jenkins, Git, Artifactory, etc
- Expert-level proficiency with Terraform and Terragrunt for managing AWS infrastructure as code
- Deep expertise in GitHub, GitHub Actions for CI/CD, including design, troubleshooting, and support
- Strong experience with AWS Cloud services (e.g., EC2, S3, RDS, VPC, IAM, Lambda, EKS/ECS, CloudWatch) and infrastructure design best practices, applied within a DevOps model
- Mastery of observability, monitoring, metrics and alerting at scale across regionally and globally resilient and distributed platforms leveraging common open source frameworks such as Prometheus, Thanos, OpenTelemetry, Grafana, etc
- Experience providing DevOps-centric support for applications developed in Java, Python, and Angular, including build automation, deployment pipelines, and observability
- Expert level proficiency in at least one scripting language (e.g., Python, Bash, Perl) and one programming language (e.g., Java, Go, Node). (Code samples and/or GitHub links to prior work desirable)
- Mastery of Containerization (Docker), and strong familiarity with the container ecosystem, especially Amazon ECS
- Mastery in config automation tool sets such as AWS Config and/or SSM, Puppet, Ansible, Chef, etc. - Includes solid knowledge of concepts and practices surrounding such solutions
- Hands on experience with APM tools such as Zipkin, Jaeger, OpenTelemetry, NewRelic, etc
- Proficient with Jira, Confluence, and git toolset
- Hands-on experience with Agile/Scrum & Waterfall process environments
- Experience implementing and supporting a variety of Open Source frameworks and projects relevant to DevOps and SRE
- Consistently exhibits a personal accountability to outcomes to all team members, peers, and stakeholders
- Able to prioritize and manage multiple projects simultaneously in order to meet deadlines
- Self-starter able to work independently with minimal supervision, and high organization and communication skills to ensure alignment with team and project goals
- Driven to learn and stay abreast of the latest technologies and DevOps best practices
- Strong analytical and problem-solving skills with a proactive, blameless, and detail-oriented approach
- Excellent communication and collaboration skills, essential for fostering a DevOps culture and working effectively across teams; ability to mentor others
- Experience with PCI, HiTrust, FedRamp/GovCloud and/or similar certification methodologies
- Experience with migrating and educating teams to newer SDLC and DevOps concepts
- Experience with APM/Observability and advanced DevOps/SRE concepts and methodologies
- Proven experience mentoring team members in DevOps practices
- Active, transferable U.S. Security clearance at the Public Trust level or higher preferred