HCLTech is a global leader in Digital, Engineering, and Cloud services, seeking passionate data engineering experts. The Senior Data Engineer will design and operationalize large-scale data systems that power next-generation AI models, focusing on building advanced ETL/ELT pipelines and enabling large-scale training data automation.
Responsibilities:
- Build robust automated ETL/ELT pipelines sourcing data from IoT, CRM, logs, and more
- Clean, transform, and engineer AI-ready features for ML model training
- Implement and manage Feature Stores (Tecton/Feast) to eliminate training‑serving skew
- Optimize data workflows using Spark/Flink for Petabyte‑scale processing
- Manage and index Vector Databases (Pinecone, Milvus) for Generative AI & RAG systems
- Automate data quality checks to detect anomalies, data poisoning, and bias
- Collaborate closely with ML Engineers on MLOps , data versioning, and lineage
- Partner with data scientists to ensure seamless experimentation and deployment
Requirements:
- Bachelor's/Master's in CS, IT, Engineering, or related fields
- 6–12+ years of experience in data engineering, with 2+ years in production AI/ML pipelines
- Strong expertise in Python, SQL, and distributed data frameworks
- Hands-on with Apache Spark, Kafka, Flink
- Experience with Hugging Face Datasets, PyTorch/TensorFlow data pipelines
- Skilled in cloud-native tools (AWS Glue, Azure Data Factory, Google Vertex AI)
- Ability to process semi-structured & unstructured data (Parquet, Avro, JSON, text)