ServiceNow is a leading company focused on AI-driven business reinvention. They are seeking a Principal Engineer for their Data Platform to lead the technical vision and architecture of the FinOps Engineering Platform, ensuring a coherent and scalable system while overseeing the migration to a modern lakehouse architecture.
Responsibilities:
- Own the end-to-end technical architecture of the FinOps Engineering Platform, ensuring the GCS Data Warehouse, data platform, development platform, infrastructure, Forecast Engine, and FCR automation compose into one coherent, scalable system
- Lead the design and development of the GCS Data Warehouse and the program to migrate ServiceNow's Global Cloud Services data platform off Cloudera onto the modern lakehouse, with zero data loss and verified correctness
- Set the technical vision and multi-year roadmap for the platform, and translate it into the concrete standards and interfaces each workstream builds against
- Make the highest-leverage, hardest-to-reverse technical decisions: technology selection, system boundaries, data contracts, and the architectural patterns that span workstreams
- Establish platform-wide engineering standards for reliability, determinism, observability, security, and production readiness, and hold the bar across teams
- Lead through influence: partner with the Senior Staff engineers who own each workstream, review their designs, resolve cross-team architectural tensions, and align everyone to a single technical direction
- Drive innovation across the platform, including the responsible use of AI/ML tooling to accelerate development and improve platform capabilities
- Foster a culture of engineering craftsmanship, knowledge-sharing, and thoughtful quality practices across every team building on the platform
- Move fast: keep the platform shipping in tight, high-velocity loops while protecting the architectural integrity that lets it scale
- Define the reference architecture for the FinOps Engineering Platform and the contracts between its parts: how the data platform serves the Forecast Engine, how forecasts drive FCR automation, how the development platform productionizes analytics, and how all of it runs on the shared infrastructure
- Lead technical decision-making on the platform-wide technology stack, system boundaries, and architectural patterns, arbitrating trade-offs that no single workstream can resolve alone
- Establish best practices for data modeling, simulation and forecasting, pipeline development, orchestration, and platform scalability across the modern data stack
- Own the cross-cutting non-functional requirements: reliability, determinism and reproducibility, observability, security and compliance, performance, and cost
- Drive innovation in FinOps data analytics and forecasting, evaluating and adopting emerging technologies where they raise the platform's ceiling
- Lead the design of the GCS Data Warehouse, the modern lakehouse foundation (Trino, Iceberg, dbt, a modern catalog) that replaces the existing Cloudera-based platform (Impala, Hive, HDFS, Hive Metastore) and serves as the substrate for the entire FinOps Engineering Platform
- Own the migration strategy and sequencing: a phased, low-risk path that moves workloads off Cloudera incrementally rather than in a single high-risk cutover, with the legacy platform decommissioned only once each workload is verified on the new foundation
- Establish full inventory and lineage of the existing platform first, the tables, transformations, scheduled jobs, and downstream consumers (Tableau, Lightdash, pipelines, the Forecast Engine), so nothing is migrated blind and nothing is left stranded
- Define the data and schema translation approach: Hive/Impala schemas and partitioning onto Iceberg tables, legacy file formats onto the lakehouse, and HiveQL/Impala SQL and Spark transformations onto Trino SQL and dbt models
- Set the correctness bar for the migration: dual-run old and new in parallel and reconcile outputs against the source platform as ground truth, with fail-loud validation so any divergence is caught before cutover, never discovered after. Petabyte-scale with zero data loss
- Plan and execute consumer cutover and the retirement of the Cloudera cluster, capturing the infrastructure cost savings (a FinOps win the platform itself can measure) and the operational simplification of consolidating onto one modern stack
- Navigate enterprise constraints, security, compliance, and approval processes, while keeping the migration moving at pace
- Work autonomously with guidance from Engineering and FinOps leadership, owning the platform's technical direction
- Partner deeply with the Senior Staff engineers who own each workstream, aligning their designs to one architecture without taking the keyboard away from them
- Collaborate with DevOps, security, and platform teams on infrastructure, CI/CD, and compliance
- Partner with product managers, FinOps practitioners, finance, and capacity-planning stakeholders to ensure the platform serves how the business actually plans, budgets, and governs cloud spend
Requirements:
- Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI's potential impact on the function or industry
- 15+ years of experience in software or data engineering, with a track record of architecting and delivering large-scale, cloud-native, data-intensive platforms with a Bachelor's degree; or 12 years and a Master's degree; or a PhD with 8 years experience in Computer Science, Engineering, or related technical field; or equivalent experience
- Proven track record as the lead architect or top technical authority for a platform spanning multiple teams and workstreams, setting direction that others build against
- Proven experience leading a large data platform migration or modernization, ideally off a legacy Hadoop or Cloudera stack (Impala, Hive, HDFS, Spark) onto a modern lakehouse, including the inventory, dual-run reconciliation, consumer cutover, and decommission of the old platform
- Deep expertise across the modern data stack (Trino/Presto, dbt, Apache Iceberg, orchestration) and in distributed-systems and cloud-native architecture
- Strong systems and backend engineering depth, with the ability to go deep in any layer of the stack to make or unblock a hard technical decision
- Hands-on experience with cloud cost management and FinOps, including the data and economics behind capacity planning, forecasting, and reservations
- Demonstrated ability to operate at high velocity in greenfield environments with evolving requirements, shipping production-quality systems fast without sacrificing architectural integrity
- Strong knowledge of data structures, algorithms, object-oriented and data-oriented design, design patterns, and performance optimization
- Deep understanding of software quality principles including reliability, determinism, observability, security, and production readiness
- Ability to troubleshoot and reason about complex distributed systems and optimize performance and cost across the stack
- Full professional proficiency in English
- Comfort with development tools such as IDEs, debuggers, profilers, source control, and Unix-based systems
- Platform architecture: Designing and owning the architecture of large, multi-component platforms, including the contracts and boundaries between independently built subsystems
- Modern data stack & lakehouse: Trino/Presto, dbt, Apache Iceberg, Lightdash, query optimization at scale, and metadata, lineage, and governance
- Platform migration & modernization: Migrating off legacy Hadoop/Cloudera (Impala, Hive, HDFS, Hive Metastore, Spark, Oozie) onto a modern lakehouse, including schema and SQL translation, phased cutover, dual-run reconciliation against the source as ground truth, and zero-data-loss guarantees at petabyte scale
- Forecasting & simulation: Deterministic, reproducible computation, multi-period simulation or time-series forecasting, and reconciliation of forecasts against ground-truth actuals
- Cloud capacity & reservations: Hyperscaler capacity procurement, AWS/GCP capacity reservations (FCR), On-Demand Capacity Reservations (ODCR), and the lead-time and coordination constraints of reserving capacity ahead of demand
- Multi-cloud & infrastructure: Kubernetes, Infrastructure as Code (Terraform, CDK, CloudFormation), CI/CD and GitOps, and the AWS/GCP/Azure and on-premises landscape the platform runs on
- Reliability & observability: SLI/SLO/error-budget design, monitoring and alerting (Splunk, Grafana, Prometheus, CloudWatch, or similar), and operating data platforms in production
- Data contracts & quality: Fail-loud ingestion, upstream contract views, and correctness invariants enforced in code rather than assumed
- API & integration design: RESTful services, authentication (OAuth/SAML), and webhook/event integrations across systems
- Conference speaking experience and recognized thought leadership in data engineering, distributed systems, or FinOps
- Proven ability to work autonomously and drive cross-team technical decisions in ambiguous, greenfield environments
- Proven ability to lead through influence: setting technical direction and raising the bar across teams you do not manage
- Strong technical writing and documentation skills for both engineering- and business-facing audiences
- Excellent collaboration skills across engineering, DevOps, data, product, and finance stakeholders
- Ability to establish technical foundations for new products with long-term vision while delivering short-term results
- FinOps Certified Practitioner, AWS/GCP/Azure architecture certifications, or equivalent
- Open-source contributions to data engineering, FinOps, or distributed-systems tooling
- Experience with additional query and compute engines (Spark, Snowflake, BigQuery) and with high-performance systems languages (Rust, Go, C++)
- Experience with data validation frameworks (Great Expectations, dbt tests, etc.) and with Apache Iceberg or lakehouse architectures
- Patent applications or publications in data systems, forecasting, or cloud technologies