Business Technology Integrators (BTI) is a Service-Disabled Veteran-Owned Small Business (SDVOSB) with over 25 years of experience delivering innovative IT solutions to the Federal Government. They are seeking a Databricks Engineer to design, build, and operate a Data & AI platform, focusing on complex data workflows and scalable ELT pipelines to deliver high-quality data for machine learning, AI/BI, and analytics.
Responsibilities:
- Design, implement, and optimize end-to-end data pipelines on Databricks, following the Medallion Architecture principles
- Build robust and scalable ETL/ELT pipelines using Apache Spark and Delta Lake to transform raw (bronze) data into trusted curated (silver) and analytics-ready (gold) data layers
- Operationalize Databricks Workflows for orchestration, dependency management, and pipeline automation
- Apply schema evolution and data versioning to support agile data development
- Connect and ingest data from enterprise systems such as PeopleSoft, D2L, and Salesforce using APIs, JDBC, or other integration frameworks
- Implement connectors and ingestion frameworks that accommodate structured, semi-structured, and unstructured data
- Design standardized data ingestion processes with automated error handling, retries, and alerting
- Develop data quality checks, validation rules, and anomaly detection mechanisms to ensure data integrity across all layers
- Integrate monitoring and observability tools (e.g., Databricks metrics, Grafana) to track ETL performance, latency, and failures
- Implement Unity Catalog or equivalent tools for centralized metadata management, data lineage, and governance policy enforcement
- Enforce data security best practices including row-level security, encryption at rest/in transit, and fine-grained access control via Unity Catalog
- Design and implement data masking, tokenization, and anonymization for compliance with privacy regulations (e.g., GDPR, FERPA)
- Work with security teams to audit and certify compliance controls
- Enable data scientists by delivering high-quality, feature-rich data sets for model training and inference
- Support AIOps/MLOps lifecycle workflows using MLflow for experiment tracking, model registry, and deployment within Databricks
- Collaborate with AI/ML teams to create reusable feature stores and training pipelines
- Architect and manage data lakes on Azure Data Lake Storage (ADLS) or Amazon S3, and design ingestion pipelines to feed the bronze layer
- Build data marts and warehousing solutions using platforms like Databricks
- Optimize data storage and access patterns for performance and cost-efficiency
- Maintain technical documentation, architecture diagrams, data dictionaries, and runbooks for all pipelines and components
- Provide training and enablement sessions to internal stakeholders on the Databricks platform, Medallion Architecture, and data governance practices
- Conduct code reviews and promote reusable patterns and frameworks across teams
Requirements:
- Experience in designing, implementing, and optimizing end-to-end data pipelines on Databricks, following the Medallion Architecture principles
- Proficiency in building robust and scalable ETL/ELT pipelines using Apache Spark and Delta Lake
- Experience in operationalizing Databricks Workflows for orchestration, dependency management, and pipeline automation
- Knowledge of schema evolution and data versioning to support agile data development
- Ability to connect and ingest data from enterprise systems such as PeopleSoft, D2L, and Salesforce using APIs, JDBC, or other integration frameworks
- Experience in implementing connectors and ingestion frameworks that accommodate structured, semi-structured, and unstructured data
- Skill in designing standardized data ingestion processes with automated error handling, retries, and alerting
- Experience in developing data quality checks, validation rules, and anomaly detection mechanisms
- Knowledge of integrating monitoring and observability tools (e.g., Databricks metrics, Grafana) to track ETL performance, latency, and failures
- Experience in implementing Unity Catalog or equivalent tools for centralized metadata management, data lineage, and governance policy enforcement
- Knowledge of data security best practices including row-level security, encryption at rest/in transit, and fine-grained access control via Unity Catalog
- Experience in designing and implementing data masking, tokenization, and anonymization for compliance with privacy regulations (e.g., GDPR, FERPA)
- Ability to work with security teams to audit and certify compliance controls
- Experience in enabling data scientists by delivering high-quality, feature-rich data sets for model training and inference
- Knowledge of supporting AIOps/MLOps lifecycle workflows using MLflow
- Experience in collaborating with AI/ML teams to create reusable feature stores and training pipelines
- Experience in architecting and managing data lakes on Azure Data Lake Storage (ADLS) or Amazon S3
- Skill in building data marts and warehousing solutions using platforms like Databricks
- Ability to optimize data storage and access patterns for performance and cost-efficiency
- Experience in maintaining technical documentation, architecture diagrams, data dictionaries, and runbooks for all pipelines and components
- Ability to provide training and enablement sessions to internal stakeholders on the Databricks platform, Medallion Architecture, and data governance practices
- Experience in conducting code reviews and promoting reusable patterns and frameworks across teams