Own end-to-end execution of internal platform initiatives across the Trase operating system, translating ambiguous work across infrastructure, runtime systems, and AI/ML workflows into clear, actionable plans while ensuring alignment across Engineering, DevOps/SRE, DevEx, and Product.
Identify and manage cross-team dependencies across services, cloud infrastructure, and AI pipelines, sequencing work to minimize blocking dependencies, reduce integration risk, and avoid rework.
Establish and maintain a lightweight operating rhythm that drives execution, including milestone tracking, execution reviews, and release readiness checkpoints, ensuring teams have clear priorities, defined success criteria, and visibility into risks.
Partner with DevOps and SRE to ensure releases are safe, validated, and traceable, and that platform and AI/ML changes are observable, auditable, and ready for production environments; drive go/no-go decisions based on system readiness and risk.
Proactively identify and manage system-level risks across infrastructure, deployment systems, AI/ML pipelines, and runtime behavior, ensuring mitigation strategies are in place before issues impact delivery.
Define and track key execution and reliability signals, including delivery predictability, release success rates, dependency resolution, and system health, acting as the source of truth for execution status and risk.
Continuously improve engineering execution by identifying inefficiencies in CI/CD workflows, testing and integration systems, and AI workflow evaluation, partnering with DevEx and DevOps to increase developer velocity, release safety, and overall system reliability.
Requirements
12+ years of experience in technical program management, engineering, or related roles
Experience working on distributed systems, cloud infrastructure, CI/CD and deployment systems
Strong understanding of DevOps / SRE workflows, system dependencies and failure modes
Demonstrated ability to break down ambiguous technical problems, drive execution across teams, influence without authority
Strong technical fluency with ability to read and understand production code, reason about system architecture and APIs, engage in technical tradeoff discussions
Experience with or exposure to AI/ML systems and LLM-based workflows, AI infrastructure (inference, evaluation, orchestration)
Ability to write code when needed (for debugging, validation, or prototyping), though not a primary responsibility
Experience working closely with DevOps / SRE teams, platform engineering teams (Strongly Preferred)
Familiarity with Kubernetes, Infrastructure-as-Code, observability systems (Strongly Preferred)
Experience in regulated or high-security environments (Strongly Preferred)
High ownership and accountability
Tech Stack
Cloud
Distributed Systems
Kubernetes
Benefits
Career track opportunity with potential for rapid advancement with strong performance as the firm grows
100% employer paid, comprehensive health care including medical, dental, and vision for you and your family.
Paid maternity and paternity for 14 weeks at employees' normal pay.
Unlimited PTO, with management approval.
Opportunities for professional development and continued learning.
Optional 401K, FSA, and equity incentives available.
Mental health benefits are available through Tara Mind.
Cost effective GLP-1 solutions available through Crux.