Coinbase is on a mission to increase economic freedom globally and is seeking a passionate Staff Site Reliability Engineer for its Core AI Infrastructure team. The role involves leading the development of scalable AI products, supporting CI/CD frameworks, and ensuring cloud security while working in a dynamic environment with high-level executives.
Responsibilities:
- Join a high-performing team of skilled engineers driving AI transformation at Coinbase
- This role involves leading the development of scalable AI products with direct exposure to high-level executives, focusing on rapid ideation, execution, and delivering impactful solutions in a dynamic, incubator-style environment
- Partner with the Coinbase Infrastructure team to support and extend existing ci/cd frameworks to support IT services, including enterprise network platforms
- Partner with security and compliance to build surveillance tooling into deployment pipelines
- Design and implement automation to streamline overall operational IT support workflows
- Action Kubernetes deployment, implementation, and support
- Build a technological roadmap based on product requirements
- Participate in on-call to support the AWS service deployment pipeline
- Promote DevSecOps mentality and establish best practices to ensure top-tier cloud security
- Set and maintain a standard of excellence for technical documentation across IT engineering
- Participate in an operational environment with strict SLAs and managed incident response and disaster recovery strategies
- Facilitate incident response, conduct root cause analysis and blameless retrospectives
- Define metrics and design/implement automation opportunities based on monitoring/observability
- Developing and maintaining integrations with other systems, such as source control and build systems
- Troubleshooting and resolving technical issues with internal toolings
Requirements:
- 10+ years experience supporting network infrastructure
- 10+ years experience automating cloud infrastructure
- Proficient in at least one scripting languages (Bash, python, Ruby, Go, etc)
- Proficiency with version control using CI/CD (Git)
- Strong experience supporting AWS services and CI/CD workflows using terraform or equivalent framework
- Strong experience with configuration management systems like Terraform, Ansible, Chef, Puppet, or Salt
- Strong experience with containers and containers orchestration like Docker and Kubernetes
- Demonstrated ability to responsibly use generative AI tools and copilots (e.g., LibreChat, Gemini, Glean) in daily workflows, continuously learn as tools evolve, and apply human-in-the-loop practices to deliver business-ready outputs and drive measurable improvements in efficiency, cost, and quality
- Expertise with linux, bash, ruby, python and/or go
- Expertise automating EC2 or containers deployment with terraform
- Strong network security fundamentals
- Experience managing and leveraging log aggregation
- Experience working in a highly regulated environment
- Experience in a fast-paced, high-growth company
- Experience in a Remote-first IT environment