Kunai is a technology solutions provider for the financial services industry, focusing on modernizing and evolving client businesses. They are seeking a Senior DevOps Engineer to build and improve cloud infrastructure and delivery pipelines for production-grade systems, particularly for a billing management platform.
Responsibilities:
- Design and implement resilient, secure, and scalable cloud environments to support client platforms in production
- Drive production readiness and operations: monitoring and alerting, incident support, runbooks, capacity planning, reliability improvements, and release readiness
- Build and maintain CI/CD workflows and reconfigure/enhance an existing proprietary pipeline using Argo
- Automate infrastructure provisioning and configuration using Infrastructure as Code (Terraform, CloudFormation, CDK)
- Support containerized deployments and orchestration using Docker and ECS
- Develop automation scripts and utilities in Python and/or Bash for deployment, configuration, and operational tasks
- Implement and maintain service configuration and deployment automation across environments (dev/test/stage/prod)
- Configure and manage cloud networking and access controls, including Security Groups
- Implement and maintain monitoring/observability capabilities (metrics, logs, traces, dashboards) and establish actionable SLOs/SLIs
- Plan and execute performance testing and scalability validation; partner with engineering to remediate bottlenecks and improve system performance
- Collaborate with engineering, architecture, security, and client stakeholders to triage issues, estimate work, and continuously improve delivery and reliability
Requirements:
- 5+ years of hands-on DevOps / Platform / SRE experience supporting production systems
- Strong experience with at least one public cloud provider (AWS, GCP, or Azure)
- Demonstrated practical experience with DevOps tools and practices, with a clear focus on production readiness and operations
- Experience designing and operating resilient systems (availability, scalability, fault tolerance)
- Strong Infrastructure as Code experience with Terraform, CloudFormation, and/or CDK
- CI/CD experience, including adapting and improving existing pipelines; experience with Argo preferred
- Containerization and orchestration experience with Docker and ECS
- Scripting/automation skills with Python and/or Bash
- Experience with service configuration and deployment automation
- Experience configuring and managing Security Groups and related cloud networking controls
- Hands-on experience with monitoring/observability and performance testing in production-like environments
- Bachelor's Degree, in lieu of a degree, demonstrating in addition to the minimum years of experience required for the role, three years of specialized training and/or progressively responsible work experience in technology for each missing year of college is required
- Experience supporting billing, payments, or financial platforms
- Familiarity with SRE practices (error budgets, incident management, postmortems)
- Exposure to multi-account/multi-environment cloud setups and governance