Coinbase is on a mission to increase economic freedom in the world, and they are seeking a Senior Site Reliability Engineer to lead the development of scalable AI products. This role involves collaborating with various teams to support IT services, automate workflows, and ensure cloud security while participating in incident response and maintaining high standards for technical documentation.
Responsibilities:
- AI-Driven Innovation: Join a high-performing team of skilled engineers driving AI transformation at Coinbase. This role involves leading the development of scalable AI products with direct exposure to high-level executives, focusing on rapid ideation, execution, and delivering impactful solutions in a dynamic, incubator-style environment
- Partner with the Coinbase Infrastructure team to support and extend existing ci/cd frameworks to support IT services, including enterprise network platforms
- Partner with security and compliance to build surveillance tooling into deployment pipelines
- Design and implement automation to streamline overall operational IT support workflows
- Action Kubernetes deployment, implementation, and support
- Build a technological roadmap based on product requirements
- Participate in on-call to support the AWS service deployment pipeline
- Promote DevSecOps mentality and establish best practices to ensure top-tier cloud security
- Set and maintain a standard of excellence for technical documentation across IT engineering
- Participate in an operational environment with strict SLAs and managed incident response and disaster recovery strategies
- Facilitate incident response, conduct root cause analysis and blameless retrospectives
- Define metrics and design/implement automation opportunities based on monitoring/observability
- Developing and maintaining integrations with other systems, such as source control and build systems
- Troubleshooting and resolving technical issues with internal toolings
Requirements:
- 5+ years experience supporting network infrastructure
- 5+ years experience automating cloud infrastructure
- Proficient in at least one scripting languages (Bash, python, Ruby, Go, etc)
- Proficiency with version control using CI/CD (Git)
- Strong experience supporting AWS services and CI/CD workflows using terraform or equivalent framework
- Strong experience with configuration management systems like Terraform, Ansible, Chef, Puppet, or Salt
- Strong experience with containers and containers orchestration like Docker and Kubernetes
- Demonstrated ability to responsibly use generative AI tools and copilots (e.g., LibreChat, Gemini, Glean) in daily workflows, continuously learn as tools evolve, and apply human-in-the-loop practices to deliver business-ready outputs and drive measurable improvements in efficiency, cost, and quality
- Expertise with linux, bash, ruby, python and/or go
- Expertise automating EC2 or containers deployment with terraform
- Strong network security fundamentals
- Experience managing and leveraging log aggregation
- Experience working in a highly regulated environment
- Experience in a fast-paced, high-growth company
- Experience in a Remote-first IT environment