Confidential is a high-growth, remote-first technology company building automation and AI infrastructure used by millions of businesses globally. The Events Team is responsible for a high-scale event streaming platform, and this role focuses on ensuring the reliability, scalability, and evolution of that system.
Responsibilities:
- Design, build, and maintain scalable event-driven systems using Kafka (MSK)
- Develop and manage AWS-based infrastructure (SQS, S3, Lambda, Aurora, Redis) with Terraform
- Improve resiliency, observability, and performance across distributed systems
- Build internal libraries and tooling to simplify event publishing and consumption for other teams
- Contribute to data governance and data hygiene practices
- Refactor and evolve systems as technologies and best practices advance
- Participate in on-call rotations to maintain uptime and reliability
- Collaborate through code reviews, technical discussions, and mentoring
Requirements:
- 5+ years of software development experience
- Strong proficiency in Python
- Hands-on experience with Kafka/MSK and event/streaming systems at scale (2+ years)
- Experience designing and supporting highly available cloud infrastructure (AWS preferred)
- Infrastructure-as-Code experience (Terraform or similar)
- Strong understanding of distributed systems, reliability, and observability
- Collaborative mindset and ability to work effectively in a remote environment
- Experience with Go
- SRE or production support experience
- Experience with cloud queue systems (SQS preferred)
- CI/CD pipeline experience (e.g., GitLab)