Avantos.ai is building the industry’s first AI-native operating system for financial services, and they are seeking a Senior DevOps Engineer to own and evolve their infrastructure and deployment practices. The role involves designing and maintaining AWS cloud infrastructure, optimizing CI/CD pipelines, and ensuring security compliance in financial services.
Responsibilities:
- Design, implement, and maintain our AWS cloud infrastructure using infrastructure-as-code principles with Terraform
- Build and optimize CI/CD pipelines to enable rapid, safe deployments across multiple environments
- Own observability strategy—implement comprehensive monitoring, logging, and alerting systems using Datadog and other tooling
- Architect and manage containerized workloads on ECS Fargate and evaluate migration paths to Kubernetes
- Establish and enforce security best practices, working closely with compliance teams on financial services requirements
- Design and implement disaster recovery, backup, and business continuity strategies
- Optimize system performance, cost efficiency, and resource utilization across AWS services
- Collaborate with engineering teams to improve service reliability, reduce toil, and establish SLOs/SLIs
- Participate in incident response and conduct thorough post-mortems to drive continuous improvement
- Mentor engineers on DevOps practices, cloud architecture patterns, and operational excellence
Requirements:
- 8+ years of experience in DevOps, SRE, or infrastructure engineering roles
- Expert-level proficiency with AWS services including ECS Fargate, ALB, Cognito, S3, SQS, and related services
- Deep hands-on experience with Terraform for managing complex, multi-account AWS environments
- Strong scripting and automation skills in Python and/or Bash
- Proven experience designing and implementing CI/CD pipelines (GitHub Actions, ArgoCD, or similar)
- Solid understanding of containerization technologies (Docker) and orchestration platforms (Kubernetes/ECS)
- Experience with observability and monitoring tools (Datadog, CloudWatch, or equivalent)
- Deep knowledge of networking, security, and AWS best practices
- Strong problem-solving abilities and experience troubleshooting complex distributed systems
- Excellent communication skills and ability to work cross-functionally with engineering teams
- Experience in financial services or highly regulated industries
- Familiarity with event-driven architectures and message queue systems (Kafka, SQS)
- Experience with PostgreSQL performance tuning and RDS management
- Knowledge of microservices architecture patterns and service mesh technologies
- Experience with security tooling, vulnerability scanning, and compliance frameworks
- Familiarity with our application stack (Golang, Next.js, PostgreSQL)
- Experience managing AI/ML infrastructure and AWS Bedrock