Mapbox is the leading real-time location platform for a new generation of location-aware businesses. The Senior Cloud Platform Engineer will be responsible for delivering cloud-native containerized infrastructure and deployment platforms, managing AWS resources, and promoting operational excellence within the Cloud Platform team.
Responsibilities:
- Actively onboard AWS resources to the declarative gitops-based framework utilizing Terraform and Terragrunt
- Maintain and troubleshoot legacy cloud infrastructure in AWS that is deployed with Cloudformation/CDK and utilizes ECS, Lambda, EMR, etc
- Architect and promote Kubernetes deployments for new services
- Lead migration of deployment pipelines from ECS and Cloudformation to EKS and ArgoCD
- Architect a centralized CI pipelines framework utilizing GitHub Actions and Runs-on
- Broadly influence and lead the Mapbox Cloud Platform strategy around AWS architecture, open-source tools and frameworks
- Configure and maintain a comprehensive observability platform, such as Datadog or Observe, to enable real-time monitoring, alerting, and analytics
- Promote a culture of operational excellence by testing and monitoring our systems and code, and providing on-call support for the platform services
- Document your work and decision-making processes, and lead presentations and discussions in a way that is easy for others to understand
- Uphold a culture of collaboration, transparency, creativity, inclusion, and data-driven decisions
Requirements:
- 5+ years experience leveraging infrastructure-as-code frameworks to manage AWS infrastructure using Terraform, Terragrunt, Atlantis, CDK
- 4+ years experience orchestrating containerized workloads at scale using EKS, ECS
- 4+ years experience managing scalable CI/CD frameworks in a distributed engineering organization using Github Actions
- Strong expertise with Kubernetes, ArgoCD, Istio
- Proven ability to design and develop cost efficient, secure, and durable solutions on AWS using EKS, ECS, EC2, Lambda, Fargate, CloudFront, IAM, Route53, DynamoDB
- Proficient in at least one programming language, such as Python, Nodejs, GoLang
- Experience configuring and managing observability systems in a distributed large-scale environment using Datadog, CloudWatch, or similar
- Experience with incident response practices including blameless post-mortems and resilience engineering concepts
- A desire to share your expertise through documentation, mentorship, and both written and vocal discussion
- Ability to work asynchronously and independently with minimal supervision, lead by example, and make decisions based on priorities and business goals