As Tech Lead SRE, a key member of Estreem’s Observability & FinOps competency center, you will be responsible for driving visibility, reliability and control of costs and revenues across all cloud and application environments of the platform.
Your mission will be to establish and deploy SRE monitoring and optimization practices to enable proactive incident detection, performance optimization and sustainable cloud cost management across the various domains and Agile Release Trains.
Manage and support SRE engineers: define team objectives, performance expectations and career paths, and foster a culture of visibility, measurement and continuous improvement.
Define, design and maintain the observability strategy and frameworks covering metrics, logs, traces, alerts, telemetry and dashboards across all domains.
Ensure system reliability through proactive monitoring, anomaly detection and actionable alerts.
Work closely with SRE engineers, architects and developers to integrate observability tools into CI/CD pipelines and workflows.
Lead incident analysis, post-mortems and continuous improvement efforts, leveraging telemetry and performance data.
Implement FinOps governance to track, forecast and optimize infrastructure costs, and produce cost-visibility reports and cost/revenue analyses for management.
Promote a culture of ownership, efficiency and data-driven decision-making across all teams.
Requirements
Engineering degree or Master’s degree in Computer Science
Minimum 5 years of experience, including proven experience in a lead role
Fluent French (C2)
Professional English (B2–C1)
Expertise in Site Reliability Engineering (SRE) and observability tools (APM, logs, traces, metrics)