ServiceNow is a leading company focused on AI-driven business reinvention. They are seeking a Principal Engineer for their Data Platform to lead the technical vision and architecture of the FinOps Engineering Platform, ensuring a coherent and scalable system while overseeing the migration to a modern lakehouse architecture.

Responsibilities:

Own the end-to-end technical architecture of the FinOps Engineering Platform, ensuring the GCS Data Warehouse, data platform, development platform, infrastructure, Forecast Engine, and FCR automation compose into one coherent, scalable system
Lead the design and development of the GCS Data Warehouse and the program to migrate ServiceNow's Global Cloud Services data platform off Cloudera onto the modern lakehouse, with zero data loss and verified correctness
Set the technical vision and multi-year roadmap for the platform, and translate it into the concrete standards and interfaces each workstream builds against
Make the highest-leverage, hardest-to-reverse technical decisions: technology selection, system boundaries, data contracts, and the architectural patterns that span workstreams
Establish platform-wide engineering standards for reliability, determinism, observability, security, and production readiness, and hold the bar across teams
Lead through influence: partner with the Senior Staff engineers who own each workstream, review their designs, resolve cross-team architectural tensions, and align everyone to a single technical direction
Drive innovation across the platform, including the responsible use of AI/ML tooling to accelerate development and improve platform capabilities
Foster a culture of engineering craftsmanship, knowledge-sharing, and thoughtful quality practices across every team building on the platform
Move fast: keep the platform shipping in tight, high-velocity loops while protecting the architectural integrity that lets it scale
Define the reference architecture for the FinOps Engineering Platform and the contracts between its parts: how the data platform serves the Forecast Engine, how forecasts drive FCR automation, how the development platform productionizes analytics, and how all of it runs on the shared infrastructure
Lead technical decision-making on the platform-wide technology stack, system boundaries, and architectural patterns, arbitrating trade-offs that no single workstream can resolve alone
Establish best practices for data modeling, simulation and forecasting, pipeline development, orchestration, and platform scalability across the modern data stack
Own the cross-cutting non-functional requirements: reliability, determinism and reproducibility, observability, security and compliance, performance, and cost
Drive innovation in FinOps data analytics and forecasting, evaluating and adopting emerging technologies where they raise the platform's ceiling
Lead the design of the GCS Data Warehouse, the modern lakehouse foundation (Trino, Iceberg, dbt, a modern catalog) that replaces the existing Cloudera-based platform (Impala, Hive, HDFS, Hive Metastore) and serves as the substrate for the entire FinOps Engineering Platform
Own the migration strategy and sequencing: a phased, low-risk path that moves workloads off Cloudera incrementally rather than in a single high-risk cutover, with the legacy platform decommissioned only once each workload is verified on the new foundation
Establish full inventory and lineage of the existing platform first, the tables, transformations, scheduled jobs, and downstream consumers (Tableau, Lightdash, pipelines, the Forecast Engine), so nothing is migrated blind and nothing is left stranded
Define the data and schema translation approach: Hive/Impala schemas and partitioning onto Iceberg tables, legacy file formats onto the lakehouse, and HiveQL/Impala SQL and Spark transformations onto Trino SQL and dbt models
Set the correctness bar for the migration: dual-run old and new in parallel and reconcile outputs against the source platform as ground truth, with fail-loud validation so any divergence is caught before cutover, never discovered after. Petabyte-scale with zero data loss
Plan and execute consumer cutover and the retirement of the Cloudera cluster, capturing the infrastructure cost savings (a FinOps win the platform itself can measure) and the operational simplification of consolidating onto one modern stack
Navigate enterprise constraints, security, compliance, and approval processes, while keeping the migration moving at pace
Work autonomously with guidance from Engineering and FinOps leadership, owning the platform's technical direction
Partner deeply with the Senior Staff engineers who own each workstream, aligning their designs to one architecture without taking the keyboard away from them
Collaborate with DevOps, security, and platform teams on infrastructure, CI/CD, and compliance
Partner with product managers, FinOps practitioners, finance, and capacity-planning stakeholders to ensure the platform serves how the business actually plans, budgets, and governs cloud spend

Requirements:

Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI's potential impact on the function or industry
15+ years of experience in software or data engineering, with a track record of architecting and delivering large-scale, cloud-native, data-intensive platforms with a Bachelor's degree; or 12 years and a Master's degree; or a PhD with 8 years experience in Computer Science, Engineering, or related technical field; or equivalent experience
Proven track record as the lead architect or top technical authority for a platform spanning multiple teams and workstreams, setting direction that others build against
Proven experience leading a large data platform migration or modernization, ideally off a legacy Hadoop or Cloudera stack (Impala, Hive, HDFS, Spark) onto a modern lakehouse, including the inventory, dual-run reconciliation, consumer cutover, and decommission of the old platform
Deep expertise across the modern data stack (Trino/Presto, dbt, Apache Iceberg, orchestration) and in distributed-systems and cloud-native architecture
Strong systems and backend engineering depth, with the ability to go deep in any layer of the stack to make or unblock a hard technical decision
Hands-on experience with cloud cost management and FinOps, including the data and economics behind capacity planning, forecasting, and reservations
Demonstrated ability to operate at high velocity in greenfield environments with evolving requirements, shipping production-quality systems fast without sacrificing architectural integrity
Strong knowledge of data structures, algorithms, object-oriented and data-oriented design, design patterns, and performance optimization
Deep understanding of software quality principles including reliability, determinism, observability, security, and production readiness
Ability to troubleshoot and reason about complex distributed systems and optimize performance and cost across the stack
Full professional proficiency in English
Comfort with development tools such as IDEs, debuggers, profilers, source control, and Unix-based systems
Platform architecture: Designing and owning the architecture of large, multi-component platforms, including the contracts and boundaries between independently built subsystems
Modern data stack & lakehouse: Trino/Presto, dbt, Apache Iceberg, Lightdash, query optimization at scale, and metadata, lineage, and governance
Platform migration & modernization: Migrating off legacy Hadoop/Cloudera (Impala, Hive, HDFS, Hive Metastore, Spark, Oozie) onto a modern lakehouse, including schema and SQL translation, phased cutover, dual-run reconciliation against the source as ground truth, and zero-data-loss guarantees at petabyte scale
Forecasting & simulation: Deterministic, reproducible computation, multi-period simulation or time-series forecasting, and reconciliation of forecasts against ground-truth actuals
Cloud capacity & reservations: Hyperscaler capacity procurement, AWS/GCP capacity reservations (FCR), On-Demand Capacity Reservations (ODCR), and the lead-time and coordination constraints of reserving capacity ahead of demand
Multi-cloud & infrastructure: Kubernetes, Infrastructure as Code (Terraform, CDK, CloudFormation), CI/CD and GitOps, and the AWS/GCP/Azure and on-premises landscape the platform runs on
Reliability & observability: SLI/SLO/error-budget design, monitoring and alerting (Splunk, Grafana, Prometheus, CloudWatch, or similar), and operating data platforms in production
Data contracts & quality: Fail-loud ingestion, upstream contract views, and correctness invariants enforced in code rather than assumed
API & integration design: RESTful services, authentication (OAuth/SAML), and webhook/event integrations across systems
Conference speaking experience and recognized thought leadership in data engineering, distributed systems, or FinOps
Proven ability to work autonomously and drive cross-team technical decisions in ambiguous, greenfield environments
Proven ability to lead through influence: setting technical direction and raising the bar across teams you do not manage
Strong technical writing and documentation skills for both engineering- and business-facing audiences
Excellent collaboration skills across engineering, DevOps, data, product, and finance stakeholders
Ability to establish technical foundations for new products with long-term vision while delivering short-term results
FinOps Certified Practitioner, AWS/GCP/Azure architecture certifications, or equivalent
Open-source contributions to data engineering, FinOps, or distributed-systems tooling
Experience with additional query and compute engines (Spark, Snowflake, BigQuery) and with high-performance systems languages (Rust, Go, C++)
Experience with data validation frameworks (Great Expectations, dbt tests, etc.) and with Apache Iceberg or lakehouse architectures
Patent applications or publications in data systems, forecasting, or cloud technologies

Principal Engineer - Data Platform

Key skills

About this role

Responsibilities:

Requirements: