Upstart is a leading AI lending marketplace dedicated to reducing the cost and complexity of borrowing for all Americans. They are seeking a Senior Software Engineer focused on Site Reliability Tooling to enhance the reliability and effectiveness of their production systems, collaborate with other engineers, and shape the future of their Site Reliability Engineering team.
Responsibilities:
- Embody and share SRE principles at Upstart
- Exercise state-of-the-art SRE practices throughout the company
- Uphold a culture of visibility, ownership, and responsibility around service reliability
- Implement standards for monitoring microservices, web apps, mobile apps, databases, Kubernetes clusters, and machine learning platforms, in a fast-paced environment
- Improve incident response practices, both within SRE and throughout the company
- Automate away toil that make sense to be automated
Requirements:
- Minimum of 6 years combined experience between Software Engineering, Site Reliability, and/or DevOps Engineering including CI/CD, TDD, internal tooling, observability, and other agile development practices
- Proficiency coding Python, Go, JavaScript/TypeScript
- Proficiency with Infrastructure as Code (Terraform, CDK, Cloudformation, etc.)
- Software engineering background with experience building internal tooling from scratch, and other agile development techniques
- Strong software design & architecture skills
- Fundamentally sound with data structures & algorithms
- Experience with on-call and incident management environments
- Experience with observability, monitoring, and reporting tools (e.g., Datadog, Sumologic, , etc.)
- Experience supporting SaaS software in a microservice-oriented cloud environment
- Ability to work with multiple teams for enterprise-wide deliverables
- Data/metrics-driven mindset
- Experience with service mesh
- Full Stack development skills
- Experience building tooling for an observability platform
- Experience leveraging LLM/GenAI to improve SRE efficiency and processes