AWSCloudDistributed SystemsDockerJavaScriptKafkaKubernetesMicroservicesNext.jsPostgresPythonTerraformGoGolangBashAIMLGitHub ActionsECSFargateArgoCDS3RDSCloudWatchSQSBedrockPostgreSQLCognitoDatadogGitHubMessage QueueService MeshCI/CDCommunicationRemote Work
About this role
Role Overview
Own and evolve our infrastructure, reliability, and deployment practices
Responsible for building the foundational platform that enables our engineering teams to ship quickly and reliably while maintaining the security and compliance standards required in financial services
Design, implement, and maintain our AWS cloud infrastructure using infrastructure-as-code principles with Terraform
Build and optimize CI/CD pipelines to enable rapid, safe deployments across multiple environments
Own observability strategy—implement comprehensive monitoring, logging, and alerting systems using Datadog and other tooling
Architect and manage containerized workloads on ECS Fargate and evaluate migration paths to Kubernetes
Establish and enforce security best practices, working closely with compliance teams on financial services requirements
Design and implement disaster recovery, backup, and business continuity strategies
Optimize system performance, cost efficiency, and resource utilization across AWS services
Collaborate with engineering teams to improve service reliability, reduce toil, and establish SLOs/SLIs
Participate in incident response and conduct thorough post-mortems to drive continuous improvement
Mentor engineers on DevOps practices, cloud architecture patterns, and operational excellence
Requirements
8+ years of experience in DevOps, SRE, or infrastructure engineering roles
Expert-level proficiency with AWS services including ECS Fargate, ALB, Cognito, S3, SQS, and related services
Deep hands-on experience with Terraform for managing complex, multi-account AWS environments
Strong scripting and automation skills in Python and/or Bash
Proven experience designing and implementing CI/CD pipelines (GitHub Actions, ArgoCD, or similar)
Solid understanding of containerization technologies (Docker) and orchestration platforms (Kubernetes/ECS)
Experience with observability and monitoring tools (Datadog, CloudWatch, or equivalent)
Deep knowledge of networking, security, and AWS best practices
Strong problem-solving abilities and experience troubleshooting complex distributed systems
Excellent communication skills and ability to work cross-functionally with engineering teams
Nice to haves: Experience in financial services or highly regulated industries, Familiarity with event-driven architectures and message queue systems (Kafka, SQS), Experience with PostgreSQL performance tuning and RDS management, Knowledge of microservices architecture patterns and service mesh technologies, Experience with security tooling, vulnerability scanning, and compliance frameworks, Familiarity with our application stack (Golang, Next.js, PostgreSQL), Experience managing AI/ML infrastructure and AWS Bedrock.
Tech Stack
AWS
Cloud
Distributed Systems
Docker
JavaScript
Kafka
Kubernetes
Microservices
Next.js
Postgres
Python
Terraform
Go
Benefits
Competitive compensation + meaningful equity
Opportunity to build production infrastructure from the ground up for a rapidly scaling AI platform
A culture optimized for engineering excellence, focus, deep work, and ownership—not ticket factories