SugarAI is redefining CRM for the age of AI. As a Senior Data Engineer, you will own the Databricks pipelines that power the Sugar Predict platform, ensuring production reliability and driving platform growth through customer onboarding and legacy modernization.
Responsibilities:
- Own Databricks production support for the Sugar Predict data platform, including monitoring, alerting, and incident response across all production data flows
- Maintain and report on SLA performance metrics for data pipeline delivery, ensuring visibility into platform health and accountability across internal and external stakeholders
- Identify and implement pipeline optimizations that reduce Databricks compute costs, improve throughput, and reduce processing windows while tracking impacts through measurable KPIs
- Migrate legacy ETL/ELT pipelines to Databricks, building automation tooling to reduce manual intervention and ensure uninterrupted data delivery during transitions
- Support new customers onboarding by provisioning, validating, and hardening tenant data pipelines that deliver reliable, isolated data from day one
- Design and build high-performance Databricks pipelines that ingest, transform, and serve ERP and CRM data at scale across both Azure and AWS environments
- Own the Delta Lake architecture including schema design, partitioning strategies, data quality enforcement, and incremental processing patterns
- Enforce data security best practices across Databricks environments, including role-based access control, secrets management, and compliance requirements for enterprise CRM and ERP data
- Implement data quality monitoring and observability across pipeline health and ML model inputs, ensuring data integrity that directly supports Sugar Predict prediction accuracy
- Apply and enforce multi-tenant data isolation patterns ensuring reliable, secure data delivery across Sugar Predict enterprise customers
- Partner with the Enterprise Architecture team to ensure Sugar Predict data pipelines integrate seamlessly with the broader SugarAI product ecosystem
- Support a globally distributed operation through on-call rotation and after-hours incident response, meeting SLAs across multiple time zones
- Maintain technical documentation, runbooks, and architectural decision records, contributing to team knowledge sharing and operational readiness across on-call and incident response scenarios
- Apply CI/CD best practices to data pipeline development, including version control, automated testing, and deployment tooling to ensure reliable and repeatable pipeline delivery
Requirements:
- 4+ years of data engineering experience
- At least 2 years on Databricks or the Apache Spark ecosystem across Azure and/or AWS
- Proficiency in PySpark, SQL, and Python with a strong track record building and operating production-grade pipelines under SLA constraints
- Hands-on experience with Delta Lake including schema evolution, ACID transactions, optimize/vacuum lifecycle, and both incremental and streaming processing patterns
- Hands-on experience with pipeline performance tuning and compute optimization in production Databricks environments
- Solid working knowledge of PostgreSQL including query optimization, schema design, and use as a source or sink in production data pipelines
- Experience supporting and maintaining legacy ETL tooling (SSIS, Informatica, custom Python/SQL pipelines, or similar) in production
- Experience supporting large-scale multi-tenant architectures with a focus on tenant isolation, per-tenant performance, and data privacy, including navigating tools and platforms that default to single-tenant assumptions
- Proven ability to work collaboratively across data science, product, and infrastructure teams, owning end-to-end delivery in a cross-functional environment
- Strong understanding of data governance, security, and compliance principles, including access control, data privacy, and protection of sensitive enterprise data across multi-tenant environments
- Experience operating Databricks workspaces across both Azure and AWS, including cost governance, cluster management, and cross-cloud data access
- Experience optimizing Databricks workloads in a Serverless environment, including compute cost governance and performance tuning for serverless compute
- Experience with Microsoft SQL Server in a data engineering or ETL context
- Exposure to ML feature engineering or feature stores (Databricks Feature Store, Feast, or similar) supporting predictive analytics
- Experience with customer onboarding automation or IaC patterns for provisioning tenant data pipelines at scale
- Databricks Certified Data Engineer Associate or Professional certification