IDEMIA Public Security is a leading provider of secure and trusted biometric-based solutions. They are seeking a highly organized and strategic technical leader to manage the SRE team driving their critical Identity Verification Platform, focusing on maintaining platform reliability and security while ensuring customer success.
Responsibilities:
- Lead, mentor, and scale multiple SRE teams supporting critical production systems
- Build a culture of ownership, accountability, and continuous improvement
- Define team structure, capacity planning, and career development paths
- Act as a senior leader during critical incidents and executive escalations
- Act as the primary customer-facing support liaison during pre-sales engagements, post-sales integrations, and ongoing day-to-day operations
- Partner closely with clients to resolve post-integration technical inquiries, guide them through our solutions, and ensure a seamless, fully supported experience throughout their entire lifecycle
- Own system reliability, availability, resilience, and operational readiness
- Define and manage SLIs, SLOs, and error budgets aligned with business and customer commitments
- Lead incident management, root cause analysis (RCA), and post‑incident remediation in a blameless culture
- Ensure production environments meet contractual, regulatory, and security requirements
- Drive automation across infrastructure provisioning, deployments, monitoring, and recovery
- Champion Infrastructure as Code, CI/CD pipelines, and self‑healing systems
- Reduce operational toil and manual intervention through tooling and architectural improvements
- Oversee the end-to-end deployment, operation, and active monitoring of a highly secure, high-traffic cloud infrastructure, including the implementation of robust alerting and comprehensive testing
- Lead and guide the team in establishing best practices for monitoring, incident response, and quality assurance, ensuring reliability, scalability, and continuous improvement of the platform
- Partner closely with Product and Development teams to ensure operational readiness for new features. Monitor and strategically optimize cloud infrastructure costs without compromising platform performance
- Collaborate with compliance and security stakeholders to support audits and regulatory obligations
- Communicate clearly with leadership on operational risks, reliability posture, and improvement plans
Requirements:
- Bachelor's degree in Computer Science, Engineering, or equivalent experience
- 8+ years of experience in site reliability engineering, infrastructure, or DevOps roles
- 4+ years of people management experience, including senior engineers or managers
- Strong experience supporting high‑availability, production systems
- Deep understanding of Linux, distributed systems, networking, and cloud infrastructure
- Proven experience with incident response, problem management, and operational excellence
- Hands‑on Experience With Cloud platforms (AWS, Azure, and/or GCP)
- Hands‑on Experience With Infrastructure as Code (Terraform, CloudFormation, Ansible)
- Hands‑on Experience With CI/CD pipelines and deployment automation
- Experience working in regulated or high‑security environments (government, public safety, identity, financial, or similar)
- Background in software engineering or platform engineering
- Experience with containerized and orchestrated environments (Docker, Kubernetes)
- Familiarity with observability tools (Prometheus, Grafana, Datadog, Splunk, ELK)
- Experience supporting compliance frameworks (SOC 2, ISO 27001, FedRAMP, CJIS, etc.)