Design, build, and operate scalable, secure, and reliable service mesh infrastructure that underpins service-to-service communication across Slack's platform.
Implement and maintain core service mesh capabilities — including service discovery, observability, traffic routing, mTLS, and policy enforcement.
Troubleshoot and resolve production issues spanning distributed systems, Kubernetes environments, networking layers, and Linux-based infrastructure.
Drive improvements in platform reliability, performance, and operational efficiency through thoughtful automation and tooling.
Contribute enhancements and fixes to internal tooling and open-source technologies, including Envoy.
Play an active role in incident response and operational excellence, helping uphold platform availability and service-level objectives (SLOs).
Requirements
Must have lawful permanent residency in the U.S.
2+ years of experience in software engineering, infrastructure engineering, or site reliability engineering.
Hands-on experience with Kubernetes and cloud platforms such as AWS or GCP.
Proven ability to work within distributed systems, microservices, or cloud-native environments.
Strong collaboration and communication skills — you work well across teams and can make complex technical topics accessible.