Serve as a Staff DevOps Engineer specializing in AWS and Kubernetes to design, implement, and optimize scalable, secure cloud-native infrastructure
Lead PoC initiatives, oversee monitoring solutions, and translate SOX compliance into actionable cloud implementation plans
Break down silos by building a comprehensive team knowledge base, ensuring broad support capabilities
Provide technical leadership in cloud migration, security, and DevOps best practices, driving innovation and operational excellence across the organization
Lead the design, implementation, and management of Kubernetes clusters on AWS EKS, ensuring high availability, scalability, and security
Implement and manage advanced features including autoscaling, monitoring, logging, and security policies
Spearhead proof-of-concept (PoC) initiatives for new tools and environments, evaluating their potential benefits for the organization
Manage the full lifecycle of Kubernetes clusters, including regular upgrades, patch management, version control, and performance optimization
Provide expert-level support and guidance to teams for deploying and optimizing applications on Kubernetes, including container orchestration and service mesh implementation
Design and implement monitoring and alerting solutions for applications and infrastructure using CloudWatch, Prometheus, and Datadog
Develop observability standards and dashboards, leveraging AI/AIOps approaches and SRE agents to enable anomaly detection, alert noise reduction, and automated root cause analysis
Develop and maintain Infrastructure as Code (IaC) using tools such as Terraform or AWS CDK, and implement CI/CD pipelines for efficient application deployment and image management
Design and implement security solutions, including the deployment and management of security tools, and translate SOX compliance requirements into actionable implementation plans for cloud environments
Lead initiatives for cloud migration and modernization of legacy applications, collaborating with cross-functional teams to support their cloud and infrastructure needs
Provide technical leadership and mentorship to junior engineers on cloud technologies and DevOps practices, implementing knowledge-sharing initiatives to ensure broad support capabilities across the team
Stay current with emerging AWS services and features, evaluating their potential benefits and optimizing cloud resource utilization and cost-efficiency
Develop and maintain comprehensive documentation, including a team knowledge base, runbooks, and process documentation to eliminate information silos
Proactively identify areas of inefficiency and develop strategic plans for process improvements across the DevOps and cloud infrastructure landscape
Participate in on-call rotations to support critical cloud infrastructure and respond to emergency issues as needed
Requirements
Bachelor's degree in Computer Science, Information Technology, or related field
7+ years of experience in DevOps or Site Reliability Engineering roles
5+ years of hands-on experience with AWS services and cloud architecture
5+ years of hands-on experience with Kubernetes, including deep expertise in cluster management, troubleshooting, and optimization
Strong proficiency in at least one programming language (e.g., Python, Go, Java)
Extensive experience with Infrastructure as Code tools (e.g., Terraform, CloudFormation, AWS CDK)
Deep understanding of containerization technologies (Docker) and orchestration platforms (Kubernetes) including security best practices
Experience with CI/CD tools and methodologies, particularly GitLab CI and Github actions
Strong knowledge of networking concepts and implementation in cloud environments
Excellent problem-solving skills and ability to troubleshoot complex systems
Proven ability to lead PoC initiatives and evaluate new technologies
Demonstrated experience in creating and maintaining technical documentation and knowledge bases
Demonstrated ability to identify operational inefficiencies and develop strategic plans for process improvements in complex cloud and DevOps environments
Strong analytical skills with the ability to translate technical insights into actionable business recommendations
Strong communication and mentoring skills, with the ability to effectively transfer knowledge to team members of varying experience levels
Proficient in Microsoft Office Suite, specifically Word, Excel, Outlook, and general working knowledge of Internet for business use.
Tech Stack
AWS
Cloud
Docker
Java
Kubernetes
Prometheus
Python
Terraform
Go
Benefits
Highly competitive and inclusive medical, dental and vision coverage options
Health Savings Account for medical expenses and dependent care expenses
Flexible Spending Account to pay for certain out-of-pocket expenses
Paid time off, including: vacation, sick time and holidays
401k match and Financial Planning tools
LTD and STD insurance coverages, as well as voluntary benefit options