AWSCloudKubernetesNode.jsAILarge Language ModelsEKSDatadogGitOpsService MeshCommunicationCollaborationRemote Work
About this role
Role Overview
Own the architecture, design, and evolution of our Kubernetes-based platform on AWS, ensuring scalability, resilience, and operational excellence
Develop, maintain, and optimize Kubernetes infrastructure using Infrastructure as Code (CDK), enforcing best practices and architectural standards
Act as the technical guide and domain expert for Kubernetes architecture, collaborating closely with the Architecture team to bridge high-level strategy with implementation and guide engineering teams with clear patterns, reference architectures, and best-practice recommendations
Lead technical decision-making across multiple teams, providing mentorship, design reviews, and hands-on support to drive platform consistency and quality
Drive proactive improvements across the platform, identifying scaling issues, reliability gaps, or operational inefficiencies before they become problems
Design and implement secure, compliant, and highly observable Kubernetes environments, integrating monitoring, logging, and alerting systems
Lead and support the migration of non-cloud-native applications to the new infrastructure, assessing readiness and implementing best practices for scalability and maintainability
Champion automation and GitOps practices to reduce manual work, eliminate drift, and improve release velocity
Lead cross-team initiatives and influence technical direction outside your immediate organization, driving adoption of cloud-native best practices
Participate in on-call rotations and contribute to incident response and root cause analysis
Requirements
Expertise in Kubernetes cluster architecture and operations, including designing and maintaining multi-node clusters for performance, scalability, and fault tolerance
Proficiency with service mesh technologies to manage observability, security, and traffic routing, along with experience implementing auto-scaling, optimizing resource utilization, and planning rolling upgrades and disaster recovery scenarios with zero downtime
Solid knowledge of and experience with AWS and EKS
Ability to assess application readiness and provide migration strategies for different application types while ensuring minimal disruption during transitions
Automation mindset with the drive to identify and automate repetitive tasks and manual processes
Experience with observability tools such as DataDog to ensure high uptime and fast issue resolution
Strong problem-solving and troubleshooting abilities, with a collaborative, team-oriented approach and a can-do attitude
Effective communication and collaboration skills across stakeholder groups of varied technical backgrounds, with strong written and verbal English proficiency
Fluency with AI tools and large language models to support platform engineering, automation, and operational efficiency
Tech Stack
AWS
Cloud
Kubernetes
Node.js
Benefits
Competitive compensation
Flexible Paid Time Off policies, including but not limited to: Quarterly Self-Care Days (4 extra paid days off annually) and Volunteer Days
Parental leave
Comprehensive health coverage, including dependents
Home office setup support
LastPass Families free account for up to 5 members
Continuous learning and development opportunities, including an annual learning stipend to invest in your growth
Peer-to-peer recognition through Motivosity
Employee Assistance Program for well-being support
Remote work stipend to support your home office needs
Short-Term or Remote-Centric Work Arrangements for added flexibility