Scalence L.L.C. is seeking a Principal Data Platform Engineer to establish a greenfield data and AI platform for federal background investigations. The role involves designing and building the data platform architecture, partnering with cloud engineering, and ensuring compliance with federal requirements.

Responsibilities:

Design and stand up the organization's data and AI platform from the ground up — architecture, compute, storage, and the lakehouse foundation
Codify the platform as infrastructure-as-code (Terraform) and build the CI/CD pipelines that promote work from development through to the accredited production environment
Establish data governance, cataloging, lineage, and fine-grained access control as foundational, not bolted on later
Build and own the ingestion, transformation, and pipeline layer that turns raw and synthetic data into governed, analysis-ready data products
Design the platform to operate within FedRAMP Moderate, NIST 800-171, and CUI constraints, treating compliance as a first-class architectural requirement
Define the artifact promotion process so only signed, validated artifacts cross into the accredited environment
Partner with cloud engineering across the infrastructure/security boundary, with clear ownership of the in-platform layer
Enable the data science and ML team with the platform capabilities, governed data, and tooling they need to ship models and AI features into the product
Own platform reliability, performance, and cost discipline as usage scales
Set the engineering standards, patterns, and documentation a growing data team will build on

Requirements:

U.S. citizenship required
Must be able to obtain and maintain a T5/SSBI federally adjudicated clearance; active clearance preferred
[8]+ years in data engineering / data platform engineering, with demonstrated principal-level ownership
Has stood up a data platform or lakehouse from scratch — owning the architecture and build end to end, not operating an inherited one
Design of batch (and, where needed, streaming) data pipelines and SQL-based transformations on a lakehouse/Delta foundation, with sound analytical data modeling
Infrastructure-as-code (Terraform) and CI/CD for data workloads, including environment promotion from development to production
Platform-level data governance: cataloging, lineage, and fine-grained access control
Hands-on cloud experience with a major provider (Azure preferred; AWS or GCP considered)
Strong proficiency in Python and SQL
Track record partnering across an infrastructure/security boundary and setting technical standards for other engineers
Excellent analytical, troubleshooting, and communication skills
Bachelor's in a technical field or equivalent experience
Hands-on Databricks: Unity Catalog, Databricks Asset Bundles, MLflow
Experience in regulated or accredited environments: FedRAMP, NIST 800-171, CMMC, CUI handling, or the ATO/RMF process
Active security clearance (T5/SSBI or higher)
Government or defense contracting experience
Familiarity with MLOps patterns (model registry, model serving) to support a data science team
Cost governance / FinOps discipline for cloud data platforms
Spark / PySpark — relevant since the platform is Databricks, though the data volume here does not demand distributed-scale expertise

Principal Data Platform Engineer

Key skills

About this role

Responsibilities:

Requirements: