Design and implement backup and disaster recovery procedures for workspace configurations, notebooks, Unity Catalog metadata, and job definitions; maintain recovery runbooks and perform periodic DR testing aligned to RTO/RPO objectives
Monitor and optimize platform performance, including SQL warehouse query tuning, cluster autoscaling configuration, Photon enablement, and Delta Lake optimization guidance (OPTIMIZE, VACUUM, Z-ordering strategies)
Administer Delta Live Tables (DLT) pipelines and coordinate with data engineering teams on pipeline health, data quality monitoring, failed job remediation, and pipeline configuration best practices
Manage third-party integrations and ecosystem connectivity, including BI tool integrations (e.g., Power BI), and external metadata catalog integrations
Implement Databricks Asset Bundles (DABs) for standardized deployment patterns; automate workspace resource deployment (jobs, pipelines, dashboards) across SDLC environments using bundle-based CI/CD workflows
Conduct capacity planning and scalability analysis, including forecasting concurrent user/workload growth, platform scaling strategies, and proactive resource allocation during peak usage periods
Facilitate user onboarding and enablement, including new user/team onboarding procedures, training coordination, workspace access provisioning, and creation of self-service documentation/guides
Requirements
7+ years in cloud/data platform administration and operations, including 4+ years supporting Databricks or similar platforms
Experience with the Scrum framework, Agile engineering, Lean methodologies or DevOps
Experience with one or more of the following: system development, software development, hardware development, or mission support
Experience working with DevOps CI/CD related technologies (Azure DevOps, Git, Jenkins, Puppet, Docker, Confluence, Sonar Lint, and J-Unit
Hands-on experience administering Databricks (workspace administration, clusters/compute policies, jobs, SQL warehouses, repos, runtime management) and expertise using Databricks CLI
Automation skills: scripting and/or IaC using Terraform/CLI/REST APIs for repeatable configuration and environment promotion
Experience implementing data governance controls (classification/tagging, lineage/metadata integrations) in partnership with governance teams
CI/CD practices for jobs/notebooks/config promotion across SDLC environments
Understanding of lakehouse concepts (e.g., Delta, table lifecycle management, separation of storage/compute)
SQL proficiency and data engineering fundamentals for troubleshooting query performance issues, understanding ETL/ELT workflow patterns, and debugging data pipeline failures; basic Python/Scala familiarity for notebook/code issue diagnosis
Experience with compliance and regulatory frameworks (FedRAMP, HIPAA, SOC2, or similar) including implementation of data residency requirements, retention policies, and audit-ready evidence collection
Hands-on experience with AWS security and networking services, including PrivateLink, Secrets Manager/Systems Manager integration, CloudWatch/CloudTrail integration, S3 bucket policies, cross-account access patterns, and KMS encryption key management
Experience administering Databricks serverless compute, Workspace Git integrations (GitLab), Databricks Asset Bundles (DABs) for deployment automation, and modern workspace features supporting DevOps workflows
SLA/SLO management and stakeholder communication skills; ability to define platform service levels, produce operational reports, translate technical issues to business stakeholders, and manage vendor relationships (Databricks account teams)
Tech Stack
AWS
Azure
Cloud
Docker
ETL
Jenkins
Puppet
Python
Scala
SDLC
SQL
Terraform
Unity
Benefits
Health insurance plans
Health Savings Account (HSA)
Dental
Vision
Long-term disability
Short-term disability
Basic term life insurance
Supplemental term life insurance for employees, spouses, and dependents