Peraton is a next-generation national security company that drives missions of consequence spanning the globe. They are seeking an experienced Senior AWS Cloud Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of their cloud infrastructure on Amazon Web Services (AWS). The role involves collaborating with cross-functional teams to automate processes and improve release management.
Responsibilities:
- Design, implement, and manage infrastructure as code (IaC) solutions using tools like AWS CloudFormation, Terraform or Helm Charts to automate continuous database deployment and scaling processes
- Implement robust monitoring and alerting systems to proactively identify and address potential issues before they impact system performance
- Conduct performance analysis and optimization of AWS infrastructure components to enhance system efficiency and reduce latency
- Participate in on-call rotations to respond to and resolve incidents promptly
- Work closely with security teams to implement and enforce best practices for securing AWS environments
- Facilitate clear communication across teams, providing updates on release status, known issues, and any potential impact on stakeholders
- Collaborate with development, QA, and operations teams to plan and coordinate database schema releases
- Develop and maintain automated deployment pipelines using industry-standard tools such as GitLab CI/CD, Liquibase, or similar
- Proactively identify areas for process improvement within the release management lifecycle
- Collaborate with QA teams to establish and execute release validation procedures
Requirements:
- Bachelor's Degree and 8 years of experience or 12 years of experience and a HS Degree/Diploma
- Proven experience as a Site Reliability Engineer or similar role with a strong emphasis with relational databases
- In-depth knowledge of AWS services like RDS and DynamoDB and expertise in managing cloud infrastructure
- Advanced level programming and/or scripting in 3 or more of the following languages: Python, Java, Chef, Helm, Playwright, Bash, JavaScript, Terraform
- Strong understanding of DevOps principles and continuous integration/continuous deployment (CI/CD) pipelines
- Proficiency in CI/CD tools such as GitLab CI/CD, Liquibase, or others
- Familiarity with infrastructure as code (IaC) tools like CloudFormation, Terraform, Helm Charts, or similar technologies
- Hands-on experience with version control systems (GitLab, GitHub, AWS CodeCommit) and branching strategies
- Experience with containerization and orchestration tools (e.g., Amazon Elastic Compute Service (ECS), Amazon Elastic Kubernetes Service (EKS), Docker, Kubernetes)
- Familiarity with monitoring tools (e.g., CloudWatch, Prometheus, Grafana, Datadog) and log analysis
- Attention to detail, with a focus on maintaining high-quality software releases
- Solid understanding of Agile methodologies and their application in release management
- Excellent problem-solving and troubleshooting skills
- Strong communication and collaboration skills
- Must be a US Citizen
- Must be able to obtain and maintain the required agency clearance (6C Public Trust)
- Relevant certifications in DevOps or related fields are a plus
- High Risk Public Trust or Secret Clearance preferred
- 3 or more years in SRE or Platform Engineering group for high availability/critical platforms/applications
- 2 or more years managing relational databases