Sr TechOps Lead Engineer (AWS Cloud)
Department:
Technology / Engineering
Role Overview
We are seeking a highly experienced
TechOps SME/Lead Engineer
with deep expertise in
Cloud
to lead our cloud infrastructure, DevOps practices, reliability engineering, and operational excellence initiatives. This role is both strategic and hands-on — responsible for designing scalable architectures, improving automation, ensuring system reliability, and leading the TechOps team.
Key Responsibilities
-
Architect and manage secure, scalable, and highly available infrastructure on AWS.
-
Design multi-account AWS environments using AWS Organizations.
-
Implement VPC architecture, IAM policies, networking, and security best practices.
-
Oversee EC2, ECS/EKS, Lambda, RDS, S3, CloudFront, and related AWS services.
-
Optimize AWS cost management and resource utilization.
Reliability & Production Operations
-
Implement Site Reliability Engineering (SRE) best practices.
-
Define SLIs, SLOs, and error budgets.
-
Manage monitoring and alerting (CloudWatch, Datadog, Prometheus, Grafana).
-
Lead incident response, root cause analysis (RCA), and postmortems.
-
Ensure 24/7 uptime and operational resilience.
Security & Compliance
-
Implement IAM best practices and least-privilege access controls.
-
Manage secrets and key management (AWS KMS, Secrets Manager).
-
Conduct vulnerability management and patching.
-
Support compliance initiatives (SOC 2, ISO 27001, GDPR as applicable).
-
Lead disaster recovery planning and backup strategies.
Leadership & Strategy
-
Lead and mentor a team of DevOps/TechOps engineers.
-
Establish operational KPIs and performance benchmarks.
-
Manage on-call rotations and escalation processes.
-
Collaborate with Engineering, Product, Security, and Data teams.
-
Contribute to long-term infrastructure strategy and cloud roadmap.
<>Required Qualifications
-
Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
-
10+ years in DevOps, Cloud Engineering, or Infrastructure roles.
-
5+ years leading technical teams.
-
Strong hands-on experience with AWS services (EC2, EKS, RDS, S3, IAM, VPC, Lambda).
-
Deep knowledge of networking, Linux systems, and distributed systems.
-
Experience with Infrastructure-as-Code (Terraform or CloudFormation).
-
Strong scripting skills (Python, Bash, or similar).
-
Experience with containerization (Docker) and Kubernetes (EKS preferred).
Key Competencies
-
Strong architectural thinking
-
Hands-on technical leadership
-
Crisis and incident management
-
Strategic planning and execution
-
Excellent cross-functional communication
Success Metrics
-
99.9%+ production uptime
-
Reduced deployment lead time
-
Reduced incident frequency and MTTR
-
Improved cost efficiency
-
High-performing and scalable TechOps function