Blueprint is a technology solutions firm headquartered in Bellevue, Washington, focused on helping organizations unlock value through innovative technology. The Software Engineer – Service Reliability & Observability will maintain and improve customer-facing data services, ensuring system reliability, security, and performance while collaborating with cross-functional teams to resolve incidents and enhance service stability.
Responsibilities:
- Own day-to-day health of a production service, including bug fixes, reliability improvements, and security remediation
- Investigate and resolve customer-impacting incidents through deep root-cause analysis
- Implement security fixes and ensure adherence to enterprise security standards
- Improve service observability by enhancing logging, metrics, tracing, and alerting
- Build and maintain telemetry pipelines and dashboards for monitoring service health and performance
- Automate recurring operational and support tasks to reduce manual effort and increase efficiency
- Contribute to service hardening efforts, including resiliency improvements and failure-mode analysis
- Maintain and enhance backend services (primarily C#/.NET), with occasional support for frontend components
- Create and maintain technical documentation for service workflows, behaviors, and known issues
- Communicate service health and engineering metrics to stakeholders, including incident trends and improvements
Requirements:
- Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience
- Strong experience developing and maintaining production services using C#/.NET
- Experience troubleshooting and resolving live-site issues in customer-facing applications
- Hands-on experience with observability tools (logging, metrics, distributed tracing, alerting)
- Familiarity with secure coding practices and handling security vulnerabilities in production systems
- Experience automating operational workflows using scripts or tooling
- Ability to collaborate effectively across engineering, support, and security teams during incident response
- Strong analytical and problem-solving skills with a focus on root-cause analysis
- Clear communication skills for documenting incidents, writing postmortems, and providing updates
- Experience supporting large-scale or enterprise data services/APIs with high reliability requirements
- Familiarity with modern frontend technologies (e.g., React) for debugging or minor enhancements
- Experience with cloud platforms (e.g., Azure) and service monitoring tools
- Proven experience improving observability and reducing operational toil through automation
- Understanding of compliance, privacy, and data governance in regulated environments
- Experience with incident management processes, post-incident reviews, and operational excellence practices
- Ability to create concise, leadership-ready summaries of incidents, risks, and improvements