Educology Solutions is seeking a Data Engineer to design, build, and operate a Databricks based Data & AI capabilities. The role involves engineering infrastructure for seamless data flow, building ETL/ELT pipelines, and enabling data scientists with high-quality datasets for machine learning and analytics.
Responsibilities:
- Build and scale Databricks AI/BI solutions end to end, combing governed semantic models, SQL, and performance optimized query layers
- Develop and operationalize Databricks Genie experiences by curating datasets, metadata, and prompts for natural language, selfservice analytics
- Design and deliver Databricks dashboards and visual products that translate data into clear actionable insights
- Design, implement, and optimize endtoend data pipelines on Databricks, following the Medallion Architecture principles
- Build robust and scalable ETL/ELT pipelines using Apache Spark and Delta Lake to transform raw (bronze) data into trusted curated (silver) and analyticsready (gold) data layers
- Operationalize Databricks Workflows for orchestration, dependency management, and pipeline automation
- Apply schema evolution and data versioning to support agile data development
- Connect and ingest data from enterprise systems such as PeopleSoft, D2L, and Salesforce using APIs, JDBC, or other integration frameworks
- Implement connectors and ingestion frameworks that accommodate structured, semistructured, and unstructured data
- Design standardized data ingestion processes with automated error handling, retries, and alerting
- Develop data quality checks, validation rules, and anomaly detection mechanisms to ensure data integrity across all layers
- Integrate monitoring and observability tools (e.g., Databricks metrics, Grafana) to track ETL performance, latency, and failures
- Implement Unity Catalog or equivalent tools for centralized metadata management, data lineage, and governance policy enforcement
- Enforce data security best practices including rowlevel security, encryption at rest/in transit, and finegrained access control via Unity Catalog
- Design and implement data masking, tokenization, and anonymization for compliance with privacy regulations (e.g., GDPR, FERPA)
- Work with security teams to audit and certify compliance controls
- Enable data scientists by delivering highquality, featurerich data sets for model training and inference
- Support AIOps/MLOps lifecycle workflows using MLflow for experiment tracking, model registry, and deployment within Databricks
- Collaborate with AI/ML teams to create reusable feature stores and training pipelines
- Architect and manage data lakes on Azure Data Lake Storage (ADLS) or Amazon S3, and design ingestion pipelines to feed the bronze layer
- Build data marts and warehousing solutions using platforms like Databricks
- Optimize data storage and access patterns for performance and costefficiency
- Maintain technical documentation, architecture diagrams, data dictionaries, and runbooks for all pipelines and components
- Provide training and enablement sessions to internal stakeholders on the Databricks platform, Medallion Architecture, and data governance practices
- Conduct code reviews and promote reusable patterns and frameworks across teams
- Submit a weekly schedule of hours worked and progress reports outlining completed tasks, upcoming plans, and blockers
- Track deliverables against roadmap milestones and communicate risks or dependencies
Requirements:
- Handson experience with Databricks (Delta Lake, Apache Spark) and building AI/BI solutions, including dashboards, semantic models, and Genie based natural language analytics
- Deep understanding of ELT pipeline development, orchestration, and monitoring in cloudnative environments
- Experience implementing Medallion Architecture (Bronze/Silver/Gold) and working with data versioning and schema enforcement in enterprise grade environments
- Strong proficiency in SQL, Python, or Scala for data transformations and workflow logic
- Proven experience integrating enterprise platforms (e.g., PeopleSoft, Salesforce, D2L) into centralized data platforms
- Familiarity with data governance, lineage tracking, and metadata management tools
- Experience with Databricks Unity Catalog for metadata management and access control
- Experience deploying ML models at scale using MLFlow or similar MLOps tools
- Familiarity with cloud platforms like Azure or AWS, including storage, security, and networking aspects
- Knowledge of data warehouse design and star/snowflake schema modeling
- UMGC or USM prior experience preferred