A Place for Mom is the leading platform guiding families through every stage of the aging journey. They are seeking a Staff Software Engineer to join their Agentic Platform team, responsible for building foundational AI capabilities and infrastructure. The role involves designing and building core platform primitives, ensuring safety and compliance, and collaborating with product teams to enhance the platform's usability.
Responsibilities:
- Design and build core platform primitives including provider abstraction layers (OpenAI, Anthropic, Google), structured output validation, streaming infrastructure, and token management systems
- Own safety and compliance infrastructure including composable guardrail systems, PII detection/redaction, audit logging, and privacy-first observability that never leaks sensitive data to third parties
- Build evaluation infrastructure that enables systematic quality measurement for non-deterministic LLM outputs—datasets, scorers (exact match, LLM-as-judge, schema validation), CI/CD integration, and regression detection
- Lead churn containment strategy—design provider adapters and SDK architecture that absorbs rapidly-changing LLM provider SDKs without breaking consuming applications
- Architect prompt lifecycle management systems including version control, Langfuse integration, GitHub-based review workflows, and deployment pipelines
- Design Agent-as-a-Service infrastructure for long-running async tasks using AWS EventBridge, DynamoDB, and PostgreSQL
- Collaborate with consuming teams to understand their needs, onboard them to the platform, and provide technical support
- Influence architecture, technology selections, and engineering standards across the broader organization
- Create reference implementations and technical documentation that enables other engineers to successfully adopt the platform
- Champion quality engineering practices including comprehensive testing, type safety, and observability
Requirements:
- 8+ years of software engineering experience with significant time spent building platform infrastructure, developer tools, SDKs, or distributed Systems
- Production experience with LLM/AI systems—you've built and operated systems using OpenAI, Anthropic, or similar providers, and understand the unique challenges (token limits, non-determinism, provider outages, model deprecations)
- Strong TypeScript expertise—this is our company standard, and you'll be designing APIs that other TypeScript developers consume
- Experience designing APIs and abstractions that other engineers love to use—you understand the balance between power and simplicity
- Understanding of safety and compliance in AI systems—PII handling, guardrails, audit logging, and responsible AI practices
- Experience with event-driven architectures and async processing patterns (EventBridge, SQS, or similar)
- Understanding of observability and monitoring for distributed systems—metrics, tracing, alerting, and debugging production issues
- Strong communication and technical writing skills—ability to document systems clearly and work with internal customers across multiple teams
- Track record of technical leadership without or without formal management—influencing architecture, mentoring engineers, and driving technical decisions
- Experience with cloud infrastructure (AWS preferred: Fargate, DynamoDB, RDS, S3, EventBridge)
- Experience building SDK or platform products consumed by multiple teams
- Experience with prompt engineering, prompt management systems, or LLM evaluation frameworks
- Familiarity with NestJS, Prisma, or similar TypeScript backend frameworks
- Experience with streaming architectures (SSE, WebSockets) for real-time AI applications
- Background in building multi-tenant platform infrastructure
- Experience with hexagonal architecture / ports and adapters patterns
- Contributions to open-source LLM tooling or frameworks