Red Cell Partners is an incubation firm focused on building and investing in scalable technology-led companies. They are seeking a Senior Staff Technical Program Manager to oversee internal program execution across their AI platform, ensuring that platform investments lead to reliable and measurable outcomes.
Responsibilities:
- Own end-to-end execution of internal platform initiatives across the Trase operating system, translating ambiguous work across infrastructure, runtime systems, and AI/ML workflows into clear, actionable plans while ensuring alignment across Engineering, DevOps/SRE, DevEx, and Product
- Identify and manage cross-team dependencies across services, cloud infrastructure, and AI pipelines, sequencing work to minimize blocking dependencies, reduce integration risk, and avoid rework
- Establish and maintain a lightweight operating rhythm that drives execution, including milestone tracking, execution reviews, and release readiness checkpoints, ensuring teams have clear priorities, defined success criteria, and visibility into risks
- Partner with DevOps and SRE to ensure releases are safe, validated, and traceable, and that platform and AI/ML changes are observable, auditable, and ready for production environments; drive go/no-go decisions based on system readiness and risk
- Proactively identify and manage system-level risks across infrastructure, deployment systems, AI/ML pipelines, and runtime behavior, ensuring mitigation strategies are in place before issues impact delivery
- Define and track key execution and reliability signals, including delivery predictability, release success rates, dependency resolution, and system health, acting as the source of truth for execution status and risk
- Continuously improve engineering execution by identifying inefficiencies in CI/CD workflows, testing and integration systems, and AI workflow evaluation, partnering with DevEx and DevOps to increase developer velocity, release safety, and overall system reliability
Requirements:
- 12+ years of experience in technical program management, engineering, or related roles
- Experience working on distributed systems, cloud infrastructure, CI/CD and deployment systems
- Strong understanding of DevOps / SRE workflows, system dependencies and failure modes
- Demonstrated ability to break down ambiguous technical problems, drive execution across teams, influence without authority
- Strong technical fluency with ability to read and understand production code, reason about system architecture and APIs, engage in technical tradeoff discussions
- Experience with or exposure to AI/ML systems and LLM-based workflows, AI infrastructure (inference, evaluation, orchestration)
- Ability to write code when needed (for debugging, validation, or prototyping), though not a primary responsibility
- Experience working closely with DevOps / SRE teams, platform engineering teams
- Familiarity with Kubernetes, Infrastructure-as-Code, observability systems
- Experience in regulated or high-security environments