Paxos is on a mission to open the world’s financial system to everyone by enabling the instant movement of any asset in a trustworthy way. As a Senior Site Reliability Engineer, you will shape and scale the infrastructure that powers their platform, ensuring systems are reliable, secure, and performant with a focus on Kubernetes and AWS services.
Responsibilities:
- Design, build, and operate scalable, highly available cloud infrastructure primarily on AWS
- Manage and evolve our Kubernetes environments to support the deployment and operation of modern, containerized applications
- Define and implement Infrastructure as Code (IaC) using tools like Terraform, CDK, or Crossplane
- Automate infrastructure provisioning, configuration, maintenance, and monitoring to reduce manual effort and improve reliability
- Apply best practices around security, observability, and cost optimization across infrastructure and services
- Manage and optimize database technologies, with a focus on Amazon RDS and Aurora
- Partner with development teams to ensure seamless deployment and integration of new features and updates
- Investigate and resolve incidents, perform root cause analysis, and implement long-term fixes
- Participate in on-call rotations and provide support for critical production systems
- Contribute to SRE best practices, internal tooling, and team knowledge sharing
Requirements:
- Bachelor's degree in Computer Science, Information Technology, or a related field — or equivalent practical experience
- 5+ years of experience in Site Reliability Engineering, DevOps, or related infrastructure roles
- Deep expertise in public cloud platforms, especially AWS, with hands-on experience in services like EC2, S3, Lambda, CloudWatch, and IAM
- Strong proficiency with Kubernetes and container orchestration — you've run production workloads and understand cluster management, scaling, and troubleshooting
- Extensive experience with Infrastructure as Code (IaC) using tools such as Terraform, Pulumi, or Crossplane
- Solid scripting or programming skills in languages like Python, Bash, or Go, with a strong focus on automation
- Excellent problem-solving and debugging skills, with a systems-thinking mindset
- Strong communicator who thrives in collaborative, remote-first teams
- Working knowledge of managed database services like Amazon RDS, Aurora, or PostgreSQL is a plus — but infrastructure is your main game