Together AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle. The Senior Software Engineer, Observability will be responsible for developing foundational backend services, improving existing systems, and collaborating with product teams to deliver effective solutions.
Responsibilities:
- Identify, design, and develop foundational backend services that power Together’s cloud platform
- Analyze and improve the robustness and scalability of existing distributed systems, APIs, databases, and infrastructure
- Partner with product teams to understand functional requirements and deliver solutions that meet business needs
- Write clear, well-tested, and maintainable software and IaC for both new and existing systems
- Conduct design and code reviews, create developer documentation, and develop testing strategies for robustness and fault tolerance
- Participate in an on-call rotation to address critical incidents when necessary
Requirements:
- 5+ years of demonstrated experience in building large scale, fault tolerant, distributed systems and API microservices
- Experience designing, analyzing and improving efficiency, scalability, and stability of various system resources
- Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
- Demonstrated experience with building and operating high-performance and/or globally distributed microservice architectures across one or more cloud providers (AWS, Azure, GCP)