Bamboo Health is the leader in Real-Time Care Intelligence™ solutions aimed at improving lives for everyone experiencing physical and behavioral health challenges. The Sr. Software Engineer ensures the reliability, stability, and performance of production systems across multiple applications and services, driving improvements through monitoring, automation, and resilient system design.
Responsibilities:
- Own the end-to-end lifecycle of production issues, including triage, investigation, incident response, postmortems, and follow-up actions
- Troubleshoot complex, cross-system issues, identify root causes, and implement long-term fixes
- Design, implement, and maintain monitoring, alerting, and dashboards to proactively detect reliability and performance issues
- Use AI-assisted tools responsibly to accelerate debugging, log analysis, incident response, and knowledge sharing
- Partner with Product, Engineering, and Customer Success to resolve customer-impacting issues efficiently and transparently
- Reduce recurring operational issues through automation, improved tooling, and process improvements
- Contribute code to improve reliability, observability, scalability, and operational safety
- Document incidents and standard operating procedures to improve response consistency and team effectiveness
Requirements:
- 4+ years of experience in Site Reliability Engineering, Production Support, or a similar role focused on system reliability and operations
- Strong experience supporting and troubleshooting production systems, including ownership of support tickets and incident response
- Proficiency in Ruby and the ability to read, debug, and contribute to application code when needed
- Experience with monitoring, alerting, and observability tools (metrics, logs, traces, dashboards)
- Solid understanding of SQL and database fundamentals, including performance and troubleshooting
- Familiarity with cloud platforms (AWS preferred), including serverless architectures and distributed systems
- Experience using automation, scripting, or tooling (e.g., Python) to reduce operational effort
- Comfort using or learning AI-supported tools (e.g., ChatGPT, CoPilot, or role-specific tools) to improve daily workflows
- A forward-thinking, curious mindset with an openness to experimenting with new technologies
- Strong analytical and problem-solving skills, with sound judgment and creativity in designing solutions
- Proven ability to thrive in fast-paced, high-growth, and rapidly evolving environments
- Ability to work effectively in a remote-first environment, ensuring high-quality virtual interactions with minimal distractions
- The ability to travel periodically for work