Sully.ai is building impactful healthcare solutions by leveraging AI technology to enhance clinician efficiency. The Senior Software Engineer, Research will focus on optimizing research infrastructure and integrating agentic systems, ensuring reliability and performance across various applications.
Responsibilities:
- Build and optimize core research infrastructure: evaluation pipelines, agent workflows, hallucination detectors, coding benchmarks, and research→production integrations
- Design, implement, and scale agentic systems across backend, frontend, and model integrations, collaborating closely with research and co-founders
- Own reliability, observability, and performance across agents (logging, tracing, instrumentation, safety checks)
- Ship research-proven features into production within 7 days, end-to-end
- Develop shared tools, SDKs, and internal products that accelerate iteration across Research, QA, and Engineering
- Audit all cross-agent flows for UI/UX consistency, correctness, and performance gaps
- Implement shared components, typed schemas, and contract-driven interfaces for reliability
- Establish instrumentation for frontend performance, agent consistency, latency, and model round-trip tracing
- Improve or replace brittle evaluation or agent pipelines identified during onboarding
- Partner with Research to productionize at least one new capability
- Deliver production-grade agentic workflows with <5% error rates across evaluation benchmarks
- Launch a cross-agent design system + SDK adopted by at least 2 internal teams
- Establish a weekly deploy + measure cadence with performance dashboards, latency budgets, and error budgets
- Reduce agent latency and failure rates across at least two high-volume workflows
- Ship multiple research-to-production integrations with measurable CSAT or accuracy gains
Requirements:
- Senior-level full-stack engineering experience in React, TypeScript, and Node.js
- Proven ability to design, ship, and scale LLM-powered applications
- Expertise in API design, streaming, and CI/CD pipelines
- Strong cloud infrastructure background (AWS, GCP, or Azure)
- Track record of building reliable systems with measurable performance and error budgets