Develop and maintain resilient, cost-efficient infrastructure using AWS and other cloud services to meet evolving business needs.
Use IaC solutions to enable automated provisioning and ensure consistency across all environments.
Design, develop, and maintain advanced pipelines, ensuring automated testing integration and deployment efficiency at scale.
Design and deploy comprehensive monitoring, logging, and alerting solutions (e.g., Kiali, Prometheus, etc) to proactively detect risks and assess system health.
Monitor AWS usage patterns to identify and implement strategies for cost reduction and resource optimization.
Implement security improvements to protect the platform's attack surface and develop disaster recovery processes to mitigate risks.
Continuously evaluate emerging paradigms
including agentic development, context engineering, and AIOps
to automate routine tasks, optimize deployment workflows, and transition infrastructure toward predictive, self-healing capabilities.
Requirements
8+ years of experience in AWS cloud technologies, operating and scaling production SaaS workloads in DevOps or Infrastructure roles.
Advanced expertise with Terraform, Helm, Jaeger tracing, centralized logging and managing of event-driven Kubernetes microservices architectures.
Background in software development (e.g., building services, APIs, or data pipelines) is a strong asset.
Hands-on experience with EKS, SQS, SNS, Lambda, AuroraDB, Keyspaces and OpenSearch, operated in high-availability production environments.
Proven experience managing Kubernetes cluster in EKS, setting up autoscaling and load monitoring criteria through tools like Karpenter
Experience setting up and maintaining a CICD pipeline using ArgoCD, GH actions, blue/green deployments
Strong understanding of Linux systems and proficiency in Python and scripting languages, with bonus points for experience building production services.
Track record of mentoring engineers, conducting architecture reviews and suggesting and planning for improvements
A proven track record of tracking cloud costs and implementing effective reduction strategies.
Top-notch communication skills with a mindset focused on knowledge sharing and cross-functional excellence.
Proven experience in leveraging AI tools to surface insights faster and a passion for fostering an engineering culture rooted in automation and AI-augmented productivity.
Tech Stack
AWS
Cloud
Kubernetes
Linux
Microservices
Prometheus
Python
Terraform
Benefits
Extended Health Benefits
$600 Annual Lifestyle Spending Account
9 mental health/sick days
Volunteer Paid Time Off
Group RRSP/TFSA Retirement Savings Match
Paid Time Off
Monthly Work from Home Stipend
Employee Stock Option Plan
$1,000 Annual Learning Budget
Remote first environment 'Work from (Almost) Anywhere' policy