Design, build, and maintain scalable, reliable infrastructure and services that support Hosting Services, Site Reliability Engineering, virtualization, and data center operations for a federal customer.
Collaborate closely with software developers, infrastructure engineers, and IT operations teams to plan and execute deployments, improve system architectures, and enhance service reliability.
Use automation and scripting (e.g., Python, Bash) to reduce manual work, streamline deployments, and improve consistency across environments.
Monitor system performance, availability, and capacity using modern tooling; proactively identify issues and participate in on-call support to restore services quickly when incidents occur.
Implement and support Continuous Integration/Continuous Delivery (CI/CD) pipelines using tools such as Jenkins, Git, and Terraform to enable reliable and repeatable releases.
Leverage containerization and orchestration technologies such as Docker and Kubernetes to build resilient, scalable platforms.
Work with databases (e.g., SQL, MySQL) and application stacks (e.g., Java-based services) to ensure data integrity, performance, and fault tolerance.
Partner with cross-functional teams, using Jira and other collaboration tools, to track work, communicate status, and drive continuous improvement in reliability and operational excellence.
Contribute to a culture of teamwork and collaboration by sharing knowledge, participating in post-incident reviews, and helping define best practices for reliability engineering.
Requirements
5+ years of related experience in Site Reliability Engineering, DevOps, systems engineering, or software engineering roles
Experience with deployments and production operations in Linux-based environments
Proficiency with scripting/coding (e.g., Python, Java, shell scripting)
Hands-on experience with AWS or other cloud platforms
Strong Linux administration skills
Experience with SQL/MySQL and database concepts
Containerization and orchestration (Docker, Kubernetes)
CI/CD and automation tools (Jenkins, Git, Terraform, Ansible)
Experience with Infrastructure as Code ( IaC ) and automated configuration management
Must have a BA/BS or equivalent
Tech Stack
Ansible
AWS
Cloud
Docker
Java
Jenkins
Kubernetes
Linux
MySQL
Python
Shell Scripting
SQL
Terraform
Benefits
Comprehensive benefits and wellness packages
401K with company match
Paid time off
Full flex work weeks where possible
Variety of paid time off plans including vacation, sick and personal time, holidays, paid parental, military, bereavement and jury duty leave
15 days of paid leave per calendar year
Additional 10 paid holidays per year
Short and long-term disability benefits
Life, accidental death and dismemberment, personal accident, critical illness and business travel and accident insurance