Scalence L.L.C. is seeking a Principal Data Platform Engineer to establish a greenfield data and AI platform for federal background investigations. The role involves designing and building the data platform architecture, partnering with cloud engineering, and ensuring compliance with federal requirements.
Responsibilities:
- Design and stand up the organization's data and AI platform from the ground up — architecture, compute, storage, and the lakehouse foundation
- Codify the platform as infrastructure-as-code (Terraform) and build the CI/CD pipelines that promote work from development through to the accredited production environment
- Establish data governance, cataloging, lineage, and fine-grained access control as foundational, not bolted on later
- Build and own the ingestion, transformation, and pipeline layer that turns raw and synthetic data into governed, analysis-ready data products
- Design the platform to operate within FedRAMP Moderate, NIST 800-171, and CUI constraints, treating compliance as a first-class architectural requirement
- Define the artifact promotion process so only signed, validated artifacts cross into the accredited environment
- Partner with cloud engineering across the infrastructure/security boundary, with clear ownership of the in-platform layer
- Enable the data science and ML team with the platform capabilities, governed data, and tooling they need to ship models and AI features into the product
- Own platform reliability, performance, and cost discipline as usage scales
- Set the engineering standards, patterns, and documentation a growing data team will build on
Requirements:
- U.S. citizenship required
- Must be able to obtain and maintain a T5/SSBI federally adjudicated clearance; active clearance preferred
- [8]+ years in data engineering / data platform engineering, with demonstrated principal-level ownership
- Has stood up a data platform or lakehouse from scratch — owning the architecture and build end to end, not operating an inherited one
- Design of batch (and, where needed, streaming) data pipelines and SQL-based transformations on a lakehouse/Delta foundation, with sound analytical data modeling
- Infrastructure-as-code (Terraform) and CI/CD for data workloads, including environment promotion from development to production
- Platform-level data governance: cataloging, lineage, and fine-grained access control
- Hands-on cloud experience with a major provider (Azure preferred; AWS or GCP considered)
- Strong proficiency in Python and SQL
- Track record partnering across an infrastructure/security boundary and setting technical standards for other engineers
- Excellent analytical, troubleshooting, and communication skills
- Bachelor's in a technical field or equivalent experience
- Hands-on Databricks: Unity Catalog, Databricks Asset Bundles, MLflow
- Experience in regulated or accredited environments: FedRAMP, NIST 800-171, CMMC, CUI handling, or the ATO/RMF process
- Active security clearance (T5/SSBI or higher)
- Government or defense contracting experience
- Familiarity with MLOps patterns (model registry, model serving) to support a data science team
- Cost governance / FinOps discipline for cloud data platforms
- Spark / PySpark — relevant since the platform is Databricks, though the data volume here does not demand distributed-scale expertise