Together AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle. As a Senior Backend Engineer, you will play a key role in building a highly available cloud infrastructure that supports cutting-edge ML hardware and provides self-serve AI cloud services.
Responsibilities:
- Identify, design, and develop foundational backend services that power Together’s cloud platform
- Analyze and improve the robustness and scalability of existing distributed systems, APIs, databases, and infrastructure
- Partner with product teams to understand functional requirements and deliver solutions that meet business needs
- Write clear, well-tested, and maintainable software and IaC for both new and existing systems
- Conduct design and code reviews, create developer documentation, and develop testing strategies for robustness and fault tolerance
- Participate in an on-call rotation to address critical incidents when necessary
Requirements:
- 5+ years of demonstrated experience in building large scale, fault tolerant, distributed systems and API microservices
- Experience designing, analyzing and improving efficiency, scalability, and stability of various system resources
- Excellent communication skills – able to write clear design docs and work effectively with both technical and non-technical team members
- Demonstrated experience with building and operating high-performance and/or globally distributed microservice architectures across one or more cloud providers (AWS, Azure, GCP)
- Strong systems knowledge across compute, networking, and storage, including concurrency, memory management, performant I/O, and scale
- Experience developing against and managing a relational database, such as PostgreSQL
- Expert-level programmer in one or more of programming language (Golang preferred)
- Proficiency in version control practices and integrating IaC with CI/CD pipelines
- Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience
- Experience with Kubernetes and containers
- Experience building and operating data infrastructure (Kinesis, Airflow, Kafka, etc)