Filevine is a Legal AI company delivering Legal Operating Intelligence for the future of legal work. They are seeking a VP of Engineering, Reliability to lead the strategy and operations for teams responsible for infrastructure and site reliability, ensuring the organization can scale effectively while maintaining high standards of service and compliance.
Responsibilities:
- Define and execute the reliability engineering roadmap, aligning infrastructure and AI-native architecture with Filevine’s enterprise growth and platform modernization
- Balance centralized platform capabilities with distributed ownership, ensuring the reliability model scales across a diversifying technology portfolio
- Establish and manage SLO/SLI/error budget frameworks to create a shared language for balancing feature velocity with system stability
- Lead infrastructure cost management (optimization and forecasting), capacity planning, and disaster recovery to meet rigorous enterprise contractual commitments
- Lead and scale a multi-disciplinary organization (DevOps, SRE, DBRE, Tooling), fostering a culture of ownership, high craftsmanship, and clear career growth
- Drive continuous improvement through DORA metrics, incident trend analysis, and systematic toil reduction to enhance service availability and deployment health
- Delivery of self-service tooling, guardrails, and documentation that allow feature teams to operate their own services effectively without bottlenecks
- Act as the primary engineering interface for the CISO to advance compliance posture (FedRAMP, SOC 2, CJIS, ISO) and translate security needs into pragmatic action
- Collaborate with the CTO, CPO, and Architect to communicate risks and investment needs, positioning reliability as a key enabler for enterprise go-to-market success
Requirements:
- 15+ years of engineering experience, with 7+ years specifically leading infrastructure, reliability, or platform teams at scale in product-driven companies
- Proven track record managing organizations of 40+ engineers across SRE, DevOps, and Tooling, including developing multiple layers of management
- Demonstrated experience evolving reliability operating models to meet the shifting needs of a scaling business
- Deep expertise operating in regulated sectors (Legal Tech, Fintech, Gov, or Healthcare) where compliance and data sensitivity are primary constraints
- Practical, production-hardened understanding of SRE principles, including SLOs, error budgets, toil reduction, and incident management
- Strong technical command of AWS, container orchestration, Terraform (IaC), CI/CD, and modern observability stacks
- Direct experience owning cloud infrastructure budgets and successfully driving meaningful cost optimization and forecasting
- Familiarity with the reliability requirements for modern AI workloads, such as model serving, vector search, and data pipeline integrity
- Ability to engage the C-suite on risk trade-offs and transformation progress with a 'builder mentality' that thrives on solving complex, high-stakes problems