Role Overview
- Architect end-to-end data pipelines processing terabytes of IoT telemetry on Azure Databricks (PySpark DLT, Lakeflow) using medallion Lakehouse architecture.
- Design and optimize real-time ingestion pipelines from Azure Event Hub and Apache Kafka for high-volume industrial IoT telemetry.
- Build fault-tolerant, idempotent streaming architectures handling schema evolution, backpressure, and latency SLAs.
- Lead architecture reviews, set engineering standards, and drive decisions on data modeling, pipeline design, and platform evolution.
- Define technical direction for AI-ready data products including vector stores, embedding pipelines, and RAG-ready structured/unstructured data.
- Adopt emerging LLM orchestration frameworks (LangChain, LangGraph) to accelerate GenAI platform capabilities.
- Build production GenAI pipelines
- RAG workflows, document ingestion, PII anonymization and vector database infrastructure.
- Collaborate with data scientists and AI engineers to deliver high-quality, AI-ready datasets that improve downstream model performance.
- Enforce data governance, access control, and security policies; lead PII detection and anonymization strategies across the data platform.
- Champion CI/CD practices using GitHub Actions, DAB, Octopus, and Bamboo for automated, reliable pipeline delivery.
- Ensure compliance with enterprise security standards within the SDLC.
- Mentor engineers across seniority levels through code reviews, pairing, and technical coaching.
- Translate business and AI product requirements into clear technical roadmaps and execution plans.
- Partner with data scientists, product owners, and architects to align data investments with Honeywell's autonomy strategy.
Requirements
- 8+ years of data engineering experience with at least 2 years in a lead or senior role, demonstrating progression in technical complexity and team leadership.
- Hands-on experience building and operating medallion lakehouse architectures (Bronze / Silver / Gold).
- Deep expertise in Apache Spark / PySpark with production experience on Azure Databricks at scale.
- Strong proficiency with streaming platforms
- Apache Kafka and/or Azure Event Hub for real-time IoT data.
- Cloud data architecture skills (Azure preferred; AWS/GCP a plus) with experience designing scalable, cost-effective data lakes and warehouses using cloud-native services.
- Data modeling and schema design expertise for both transactional and analytical workloads, including dimensional modeling and data vault methodologies.
- Proven experience building data pipelines for GenAI or ML applications: RAG systems, embedding pipelines, and document ingestion.
- MLOps familiarity including model versioning, feature stores, and monitoring/observability for data and ML systems.
- Demonstrated ability to lead technical design reviews, mentor engineers, and drive architectural decisions with stakeholder buy-in.
- Proficiency in CI/CD using GitHub Actions for automating data pipeline deployments.
- Experience with LangChain, LangGraph, or other agentic AI orchestration frameworks.
- Expertise in real-time data processing frameworks (Apache Spark Streaming, Structured Streaming)
- Knowledge of MLOps practices and experience building data pipelines for AI model deployment
- Experience with time-series databases and IoT data modeling patterns
- Familiarity with containerization (Docker) and orchestration (Kubernetes) for AI workloads
- Strong background in data quality implementation for AI training data
- Experience working with distributed teams and cross-functional collaboration
- Knowledge of data security and governance practices for AI systems
- Experience working on analytics projects with Agile and Scrum Methodologies
US PERSON REQUIREMENTS:
Due to compliance with U.S. export control laws and regulations, candidate must be a U.S. Person which is defined as a U.S. citizen, a U.S. permanent resident, or have protected status In the U.S. under asylum or refugee status or have the ability to obtain an export authorization.
Tech Stack
- Apache
- AWS
- Azure
- Cloud
- Docker
- Google Cloud Platform
- IoT
- Kafka
- Kubernetes
- PySpark
- SDLC
- Spark
- Vault
Benefits
- In addition to a competitive salary, leading-edge work, and developing solutions side-by-side with dedicated experts in their fields, Honeywell employees are eligible for a comprehensive benefits package.
- This package includes employer subsidized Medical, Dental, Vision, and Life Insurance; Short-Term and Long-Term Disability; 401(k) match, Flexible Spending Accounts, Health Savings Accounts, EAP, and Educational Assistance; Parental Leave, Paid Time Off (for vacation, personal business, sick time, and parental leave), and 12 Paid Holidays.