IDEMIA Public Security is a leading provider of secure and trusted biometric-based solutions. They are seeking a highly organized and strategic technical leader to manage the SRE team driving their critical Identity Verification Platform, focusing on maintaining platform reliability and security while ensuring customer success.

Responsibilities:

Lead, mentor, and scale multiple SRE teams supporting critical production systems
Build a culture of ownership, accountability, and continuous improvement
Define team structure, capacity planning, and career development paths
Act as a senior leader during critical incidents and executive escalations
Act as the primary customer-facing support liaison during pre-sales engagements, post-sales integrations, and ongoing day-to-day operations
Partner closely with clients to resolve post-integration technical inquiries, guide them through our solutions, and ensure a seamless, fully supported experience throughout their entire lifecycle
Own system reliability, availability, resilience, and operational readiness
Define and manage SLIs, SLOs, and error budgets aligned with business and customer commitments
Lead incident management, root cause analysis (RCA), and post‑incident remediation in a blameless culture
Ensure production environments meet contractual, regulatory, and security requirements
Drive automation across infrastructure provisioning, deployments, monitoring, and recovery
Champion Infrastructure as Code, CI/CD pipelines, and self‑healing systems
Reduce operational toil and manual intervention through tooling and architectural improvements
Oversee the end-to-end deployment, operation, and active monitoring of a highly secure, high-traffic cloud infrastructure, including the implementation of robust alerting and comprehensive testing
Lead and guide the team in establishing best practices for monitoring, incident response, and quality assurance, ensuring reliability, scalability, and continuous improvement of the platform
Partner closely with Product and Development teams to ensure operational readiness for new features. Monitor and strategically optimize cloud infrastructure costs without compromising platform performance
Collaborate with compliance and security stakeholders to support audits and regulatory obligations
Communicate clearly with leadership on operational risks, reliability posture, and improvement plans

Requirements:

Bachelor's degree in Computer Science, Engineering, or equivalent experience
8+ years of experience in site reliability engineering, infrastructure, or DevOps roles
4+ years of people management experience, including senior engineers or managers
Strong experience supporting high‑availability, production systems
Deep understanding of Linux, distributed systems, networking, and cloud infrastructure
Proven experience with incident response, problem management, and operational excellence
Hands‑on Experience With Cloud platforms (AWS, Azure, and/or GCP)
Hands‑on Experience With Infrastructure as Code (Terraform, CloudFormation, Ansible)
Hands‑on Experience With CI/CD pipelines and deployment automation
Experience working in regulated or high‑security environments (government, public safety, identity, financial, or similar)
Background in software engineering or platform engineering
Experience with containerized and orchestrated environments (Docker, Kubernetes)
Familiarity with observability tools (Prometheus, Grafana, Datadog, Splunk, ELK)
Experience supporting compliance frameworks (SOC 2, ISO 27001, FedRAMP, CJIS, etc.)

Site Reliability Engineering Manager (Manager Digital Solutions)

Key skills

About this role

Responsibilities:

Requirements: