Business Technology Integrators (BTI) is a Service-Disabled Veteran-Owned Small Business (SDVOSB) with over 25 years of experience delivering innovative IT solutions to the Federal Government. They are seeking a Databricks Engineer to design, build, and operate a Data & AI platform, focusing on complex data workflows and scalable ELT pipelines to deliver high-quality data for machine learning, AI/BI, and analytics.

Responsibilities:

Design, implement, and optimize end-to-end data pipelines on Databricks, following the Medallion Architecture principles
Build robust and scalable ETL/ELT pipelines using Apache Spark and Delta Lake to transform raw (bronze) data into trusted curated (silver) and analytics-ready (gold) data layers
Operationalize Databricks Workflows for orchestration, dependency management, and pipeline automation
Apply schema evolution and data versioning to support agile data development
Connect and ingest data from enterprise systems such as PeopleSoft, D2L, and Salesforce using APIs, JDBC, or other integration frameworks
Implement connectors and ingestion frameworks that accommodate structured, semi-structured, and unstructured data
Design standardized data ingestion processes with automated error handling, retries, and alerting
Develop data quality checks, validation rules, and anomaly detection mechanisms to ensure data integrity across all layers
Integrate monitoring and observability tools (e.g., Databricks metrics, Grafana) to track ETL performance, latency, and failures
Implement Unity Catalog or equivalent tools for centralized metadata management, data lineage, and governance policy enforcement
Enforce data security best practices including row-level security, encryption at rest/in transit, and fine-grained access control via Unity Catalog
Design and implement data masking, tokenization, and anonymization for compliance with privacy regulations (e.g., GDPR, FERPA)
Work with security teams to audit and certify compliance controls
Enable data scientists by delivering high-quality, feature-rich data sets for model training and inference
Support AIOps/MLOps lifecycle workflows using MLflow for experiment tracking, model registry, and deployment within Databricks
Collaborate with AI/ML teams to create reusable feature stores and training pipelines
Architect and manage data lakes on Azure Data Lake Storage (ADLS) or Amazon S3, and design ingestion pipelines to feed the bronze layer
Build data marts and warehousing solutions using platforms like Databricks
Optimize data storage and access patterns for performance and cost-efficiency
Maintain technical documentation, architecture diagrams, data dictionaries, and runbooks for all pipelines and components
Provide training and enablement sessions to internal stakeholders on the Databricks platform, Medallion Architecture, and data governance practices
Conduct code reviews and promote reusable patterns and frameworks across teams

Requirements:

Experience in designing, implementing, and optimizing end-to-end data pipelines on Databricks, following the Medallion Architecture principles
Proficiency in building robust and scalable ETL/ELT pipelines using Apache Spark and Delta Lake
Experience in operationalizing Databricks Workflows for orchestration, dependency management, and pipeline automation
Knowledge of schema evolution and data versioning to support agile data development
Ability to connect and ingest data from enterprise systems such as PeopleSoft, D2L, and Salesforce using APIs, JDBC, or other integration frameworks
Experience in implementing connectors and ingestion frameworks that accommodate structured, semi-structured, and unstructured data
Skill in designing standardized data ingestion processes with automated error handling, retries, and alerting
Experience in developing data quality checks, validation rules, and anomaly detection mechanisms
Knowledge of integrating monitoring and observability tools (e.g., Databricks metrics, Grafana) to track ETL performance, latency, and failures
Experience in implementing Unity Catalog or equivalent tools for centralized metadata management, data lineage, and governance policy enforcement
Knowledge of data security best practices including row-level security, encryption at rest/in transit, and fine-grained access control via Unity Catalog
Experience in designing and implementing data masking, tokenization, and anonymization for compliance with privacy regulations (e.g., GDPR, FERPA)
Ability to work with security teams to audit and certify compliance controls
Experience in enabling data scientists by delivering high-quality, feature-rich data sets for model training and inference
Knowledge of supporting AIOps/MLOps lifecycle workflows using MLflow
Experience in collaborating with AI/ML teams to create reusable feature stores and training pipelines
Experience in architecting and managing data lakes on Azure Data Lake Storage (ADLS) or Amazon S3
Skill in building data marts and warehousing solutions using platforms like Databricks
Ability to optimize data storage and access patterns for performance and cost-efficiency
Experience in maintaining technical documentation, architecture diagrams, data dictionaries, and runbooks for all pipelines and components
Ability to provide training and enablement sessions to internal stakeholders on the Databricks platform, Medallion Architecture, and data governance practices
Experience in conducting code reviews and promoting reusable patterns and frameworks across teams

Data Engineer (Databricks)

Key skills

About this role

Responsibilities:

Requirements: