Lead the design and architecture of scalable, resilient, and secure cloud-native platforms and infrastructure on cloud platforms like AWS and GCP.
Develop, manage, and champion the use of IaC tools like Terraform and Crossplane to automate infrastructure provisioning and management.
Build, maintain, and optimize CI/CD pipelines using GitHub Actions and ArgoCD to automate the build, testing, and deployment of applications, enabling faster and more reliable software delivery.
Manage and scale our containerized environments using technologies like Docker and Kubernetes.
Implement and manage robust monitoring, logging, and alerting solutions (e.g., Prometheus, Grafana, Datadog) to ensure system health and proactively identify and resolve issues.
Identify problems, develop and implement solutions, and drive innovation to improve infrastructure performance, stability, and efficiency.
Work closely with development teams to provide self-service tools and platforms, troubleshoot complex issues, and act as a subject matter expert for our infrastructure.
Implement and enforce security best practices, conduct security assessments, and ensure compliance with industry standards.
Provide technical guidance and mentorship to junior engineers, fostering a culture of technical excellence and continuous improvement.
Requirements
Bachelor's or Master's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
5+ years of experience in a Platform Engineer, DevOps, or SRE role, with a proven track record of working on large-scale, distributed systems.
Deep technical experience with a major cloud provider (AWS, GCP).
Extensive hands-on experience with IaC tools like Ansible, Terraform and/or Crossplane.
Strong proficiency with containerization (Docker) and container orchestration (Kubernetes).
Solid understanding of CI/CD principles and experience with automation tools like GitHub Actions and ArgoCD.
Proficiency in one or more programming/scripting languages such as Python, Go, or Bash.
Strong knowledge of Linux/Unix administration and networking fundamentals (TCP/IP, DNS, HTTP).
Experience with observability tools (like Prometheus, Grafana, or Datadog).