Role Overview

Architect end-to-end data pipelines processing terabytes of IoT telemetry on Azure Databricks (PySpark DLT, Lakeflow) using medallion Lakehouse architecture.
Design and optimize real-time ingestion pipelines from Azure Event Hub and Apache Kafka for high-volume industrial IoT telemetry.
Build fault-tolerant, idempotent streaming architectures handling schema evolution, backpressure, and latency SLAs.
Lead architecture reviews, set engineering standards, and drive decisions on data modeling, pipeline design, and platform evolution.
Define technical direction for AI-ready data products including vector stores, embedding pipelines, and RAG-ready structured/unstructured data.
Adopt emerging LLM orchestration frameworks (LangChain, LangGraph) to accelerate GenAI platform capabilities.
Build production GenAI pipelines
RAG workflows, document ingestion, PII anonymization and vector database infrastructure.
Collaborate with data scientists and AI engineers to deliver high-quality, AI-ready datasets that improve downstream model performance.
Enforce data governance, access control, and security policies; lead PII detection and anonymization strategies across the data platform.
Champion CI/CD practices using GitHub Actions, DAB, Octopus, and Bamboo for automated, reliable pipeline delivery.
Ensure compliance with enterprise security standards within the SDLC.
Mentor engineers across seniority levels through code reviews, pairing, and technical coaching.
Translate business and AI product requirements into clear technical roadmaps and execution plans.
Partner with data scientists, product owners, and architects to align data investments with Honeywell's autonomy strategy.

Requirements

8+ years of data engineering experience with at least 2 years in a lead or senior role, demonstrating progression in technical complexity and team leadership.
Hands-on experience building and operating medallion lakehouse architectures (Bronze / Silver / Gold).
Deep expertise in Apache Spark / PySpark with production experience on Azure Databricks at scale.
Strong proficiency with streaming platforms
Apache Kafka and/or Azure Event Hub for real-time IoT data.
Cloud data architecture skills (Azure preferred; AWS/GCP a plus) with experience designing scalable, cost-effective data lakes and warehouses using cloud-native services.
Data modeling and schema design expertise for both transactional and analytical workloads, including dimensional modeling and data vault methodologies.
Proven experience building data pipelines for GenAI or ML applications: RAG systems, embedding pipelines, and document ingestion.
MLOps familiarity including model versioning, feature stores, and monitoring/observability for data and ML systems.
Demonstrated ability to lead technical design reviews, mentor engineers, and drive architectural decisions with stakeholder buy-in.
Proficiency in CI/CD using GitHub Actions for automating data pipeline deployments.
Experience with LangChain, LangGraph, or other agentic AI orchestration frameworks.
Expertise in real-time data processing frameworks (Apache Spark Streaming, Structured Streaming)
Knowledge of MLOps practices and experience building data pipelines for AI model deployment
Experience with time-series databases and IoT data modeling patterns
Familiarity with containerization (Docker) and orchestration (Kubernetes) for AI workloads
Strong background in data quality implementation for AI training data
Experience working with distributed teams and cross-functional collaboration
Knowledge of data security and governance practices for AI systems
Experience working on analytics projects with Agile and Scrum Methodologies

US PERSON REQUIREMENTS: Due to compliance with U.S. export control laws and regulations, candidate must be a U.S. Person which is defined as a U.S. citizen, a U.S. permanent resident, or have protected status In the U.S. under asylum or refugee status or have the ability to obtain an export authorization.

Tech Stack

Apache
AWS
Azure
Cloud
Docker
Google Cloud Platform
IoT
Kafka
Kubernetes
PySpark
SDLC
Spark
Vault

Benefits

In addition to a competitive salary, leading-edge work, and developing solutions side-by-side with dedicated experts in their fields, Honeywell employees are eligible for a comprehensive benefits package.
This package includes employer subsidized Medical, Dental, Vision, and Life Insurance; Short-Term and Long-Term Disability; 401(k) match, Flexible Spending Accounts, Health Savings Accounts, EAP, and Educational Assistance; Parental Leave, Paid Time Off (for vacation, personal business, sick time, and parental leave), and 12 Paid Holidays.

Lead Data Engineer

Key skills

About this role

Role Overview

Requirements

Tech Stack

Benefits