Lumenalta is a technology solutions company that partners with forward-thinking organizations to drive business growth. They are seeking a Senior Data Engineer with expertise in streaming architectures and Databricks to architect and deliver a production-grade Lakehouse platform for high-frequency sensor telemetry and complex data pipelines.
Responsibilities:
- Architect and implement data landing zones for streaming telemetry (Striim) and historical relational data (Oracle), enforcing strict asset and handle-level segregation across ingestion layers
- Design and deploy end-to-end batch and streaming data pipelines within a Databricks Lakehouse environment, leveraging Delta Live Tables (DLT) for reliable state management and incremental processing
- Develop transformation logic to calculate descriptive statistics (min, max, avg, counts) at ingestion time and build advanced Silver and Gold layer pipelines for complex multi-sensor time-alignment
- Configure ETL clusters and SQL Warehouses; establish environment-wide monitoring, spend alerts, and cost projections to manage cloud resource consumption and FinOps accountability
- Lead production cutover activities, including environment hardening, catch-up batch processing for historical backfill, and phased loading of historical parameters
- Define and enforce data quality standards and pipeline observability practices across all layers of the Lakehouse
- Collaborate with cross-functional stakeholders to translate manufacturing and IoT domain requirements into robust, scalable data architecture decisions
Requirements:
- 5+ years in data engineering with a proven track record of delivering production-grade pipelines in complex, high-volume enterprise environments
- Advanced hands-on experience with Databricks—Delta Lake, Delta Live Tables (DLT), SQL Warehouses, and cluster management at scale
- Proven experience with Striim or equivalent Change Data Capture tools in high-volume manufacturing, IoT, or telemetry contexts
- Strong proficiency in Python/PySpark and advanced SQL for both pipeline development and complex transformation logic
- Experience managing high-frequency time-series datasets and performing multi-sensor synchronization across misaligned temporal streams
- Familiarity with cloud cost management practices, including cluster sizing, spend monitoring, and resource optimization within Databricks environments
- Comfortable owning go-live activities—environment hardening, cutover planning, and post-deployment stabilization in critical production settings