Work alongside the development and operations teams to ensure speedy and reliable software deployments
Monitor systems, and improve overall reliability of the platform
Create change management tickets, and perform break fix of network appliances
Develop features utilizing the AI coding tool and repository of scripts to automate, scale, test, and secure the cloud infrastructure and the pipelines
Enhance performance monitoring of the various systems via Splunk or other dashboard reporting tools
Identify performance bottlenecks and optimize the performance of cloud infrastructure
Develop and execute test strategies that simulate real-world failure scenarios, including network disruptions, hardware failures, and system overloads
Create, script, and run performance tests to measure system behavior under varying levels of load and traffic
Design, implement, and maintain automated test suites for infrastructure and application components
Ensure that testing is integrated into the CI/CD pipeline to validate system reliability with every release
Build automated systems for continuous performance testing, stress testing, and load testing
Work closely with SREs, developers, and operations teams to define reliability goals and develop appropriate testing strategies to validate those goals
Ensure that new services and features undergo thorough testing for performance, reliability, and failure recovery before deployment to production
Validate that monitoring, logging, and alerting mechanisms are functioning correctly by testing systems under failure conditions
Ensure that Service Level Indicators (SLIs) and Service Level Objectives (SLOs) are accurately measured and tracked through automated testing frameworks
Resolve most conflicts between timeline, budget, and scope independently but intuitively raise sophisticated or consequential issues to senior management
Requirements
Requires BS degree and 8-12 years of prior relevant experience
Currently possess and ability to maintain an active DoD Secret security clearance and be eligible for a top secret
Minimum of DoD 8570.01 IAT Level II Certification required prior to onboarding and must maintain certification while supporting the SMIT Contract
Must have a vendor certification e.g., CCNA, CCNP, Juniper, Palo Alto, Aruba etc
Experience with automated script design, coding, debugging, and maintenance skills (using bash, python, etc.) preferred
Experience in CI/CD toolsets (e.g. Jenkins, GitLab, etc.)
Experience with network switches, routers, VLANs, DMZ, VPN, IPS, load balancers, and FW
Experience with SDWAN and Arista
Experience in application administration, configuration, and integration
Familiarity with agile development methodologies
Skilled and disciplined to work with a distributed team
Knowledge of Agile and DevSecOps/SRE concepts and best practices, with a desire to grow that knowledge
Hand-on experience with Atlassian products (Jira, Confluence, Bitbucket, etc.)