Reddit is a community of communities and is seeking a Senior Staff Engineer to serve as the Cloud Resources Technical Owner for the Ads Domain. This role involves defining the technical strategy for cloud resource management, optimizing costs, and leading engineering efforts to enhance cloud efficiency and observability.
Responsibilities:
- Define and drive the technical strategy for Cloud Resource management within Ad first, ensuring that cost accountability is built into the architecture of our systems
- Elevate cloud estimation from guesswork to a rigorous engineering discipline
- Lead the high-quality forecasting of new cloud investments and efficiency projects, designing data-driven models to validate technical ROI before builds happen
- Design and implement a roadmap for Cost Observability 2.0, moving beyond simple reporting to real-time, service/team-level spend attribution and automated anomaly detection
- Design and build internal platforms that programmatically enforce PnL accountability
- Engineer (or collaborate with Core Infrastructure partners) to deliver the dashboards, alerts, and governance tools that every Ads team relies on to manage their cloud footprint
- Architect automated frameworks for validating cost estimates and forecasting, replacing manual spreadsheets with data-driven software solutions
- Fight for observability by instrumenting deep telemetry into our cloud infrastructure
- Identify inefficiencies (e.g., underutilized clusters, uncompressed data flows) and re-architect critical paths for cost reduction
- Lead the technical validation of vendor and 3rd-party tool integration, ensuring we extract maximum engineering value from every dollar spent
- Act as a role model for the Ads domain and the wider company
- Set the standard for how engineering teams think about Cost as a Non Functional Requirement
- Partner with Finance and Engineering leadership to translate Cloud Spend into actionable engineering tasks
Requirements:
- 10+ years of software engineering experience, with a strong focus on public cloud infrastructure (AWS/GCP/Azure) and large-scale distributed systems
- Engineer-First Mindset: You are comfortable writing code (Go, Python, Java) to solve infrastructure problems. You don't just ask for a report; you build the API that generates it
- Deep Cloud Expertise: You have mastery over Kubernetes, container orchestration, and cloud-native storage, understanding exactly how architectural choices impact the bottom line
- Operational Excellence: Proven track record of building observability pipelines (Prometheus, Grafana, Datadog) that drive operational and financial alerts
- Influential Leader: Skilled at driving clarity in ambiguous spaces. You can convince a Principal Engineer to refactor their service for cost efficiency because you can prove the technical and business value
- Experience building custom FinOps tooling or internal developer platforms
- Background in performance engineering or capacity planning for high-traffic ad tech environments
- Contributions to open-source projects related to cloud efficiency or observability