Cloud Infrastructure Management: Design, deploy, and maintain scalable and resilient infrastructure on AWS using Infrastructure-as-Code (IaC).
Kubernetes Administration: Manage and optimize Kubernetes clusters for containerized applications, ensuring high availability and security.
Automation & CI/CD: Implement and manage CI/CD pipelines for efficient deployment, testing, and monitoring of applications.
Observability & Monitoring: Develop comprehensive monitoring solutions using Prometheus, Grafana, LGTM stack, or similar tools to improve system reliability.
Security & Compliance: Apply best practices for cloud security, IAM policies, and compliance frameworks (SOC2, ISO 27001, etc.).
Incident Response & Performance Optimization: Troubleshoot issues, perform root cause analysis, and implement fixes to optimize performance.
Infrastructure as Code (IaC): Utilize Terraform, Ansible, or similar tools to automate infrastructure provisioning and configuration management.
Collaboration & Knowledge Sharing: Work closely with software engineering, architecture and security teams to promote DevOps culture and best practices.
Disaster Recovery & Reliability Engineering: Design failover and backup strategies to ensure business continuity in the event of failures.
Requirements
Bachelor’s degree in Computer Science, Engineering, or a related field.
5+ years of experience in cloud infrastructure, SRE, or DevOps roles.
Interest in or any exposure to trading or similar themes would be desirable (not essential)