PointOne is a venture-backed startup that builds infrastructure for the legal industry, focusing on timekeeping and billing systems. They are seeking a Product Reliability Engineer to ensure the health, stability, and observability of their systems, working closely with customers to address reliability issues and improve overall system resilience.
Responsibilities:
- Respond quickly to automated alerts and customer-reported issues
- Triage, diagnose, and resolve production incidents with a bias toward permanent fixes over workarounds
- Build and maintain incident response playbooks and postmortem processes
- Coordinate cross-functionally with customer success managers and key account stakeholders to maintain customer trust in the event of an incident
- Design and instrument telemetry, logging, and alerting across our serverless AWS stack
- Build dashboards and health metrics that surface issues before customers feel them
- Identify recurring failure patterns and drive systemic fixes into the codebase
- Reduce operational toil through automation
- Contribute directly to the codebase—improving resilience, reducing tech debt, and creating automation to ensure bugs are resolved quickly and with little human intervention
- Partner with engineers on new feature launches to assess reliability risks before they ship
- Make data-driven recommendations on where to invest in stability
Requirements:
- 2+ years of software engineering experience, with meaningful time spent in reliability, platform, or production-facing roles
- Strong debugging instincts and comfort tracing failures across distributed systems using logs, traces, and metrics
- Hands-on experience with AWS (Lambda, SQS, RDS, CloudWatch or equivalent)
- Comfortable reading and writing Go, TypeScript, or similar backend languages
- Experience building or improving observability infrastructure (alerting, dashboards, telemetry)
- High ownership mentality: you close the loop, you write the postmortem, you ship the fix
- Experience in legaltech, fintech, healthtech, or other high-sensitivity, always-on environments