HCLTech is seeking a Data Engineer to design, build, and operate scalable, event-driven data pipelines across AWS and hybrid environments. The role focuses on modernizing data architecture and transitioning legacy pipelines to MS SQL Server while ensuring reliability and performance of data ingestion and synchronization services.

Responsibilities:

Build services for data ingestion and synchronization with source systems, ensuring near-real-time updates and reliable change propagation
Ingest data from multiple sources using Python and ETL tooling (including AWS-native services)
Design and implement event-driven / streaming architectures using AWS EventBridge, Kafka, or SNS/SQS for real-time data movement and processing
Design, implement, and maintain scalable data pipelines across on‑prem + AWS cloud environments
Develop efficient Python / PySpark applications to process large datasets (leveraging libraries such as pandas and NumPy )
Work with NoSQL databases (e.g., DynamoDB, MongoDB, Cassandra ) to support high-performance storage and retrieval patterns
Build and deploy components in a cloud-native architecture , ensuring scalability, resiliency, and secure-by-design implementation
Implement strong monitoring and operational excellence practices: CloudWatch , logging, alerting, tracing, and proactive troubleshooting
Transition existing pipelines to MS SQL Server and ensure performance, governance, and data quality are maintained or improved
Collaborate closely with business/application owners to understand:
Current ingestion & pipeline architecture
Business logic and transformation patterns
Data consumption and analytics needs
Design and document the target-state data architecture , including pipelines, processing patterns, and analytics enablement
Identify opportunities to optimize, consolidate, and simplify pipelines, logic, and infrastructure
Partner with the data team to decompose business logic into scalable transformation patterns and reusable components

Requirements:

AWS Glue (ETL pipeline development, orchestration, job tuning)
Streaming / Messaging: Kafka and/or AWS SNS/SQS (event-driven data flows)
Python (strong) and PySpark
Data Lake concepts and implementation (e.g., S3-based lake patterns)
Observability: CloudWatch and CloudTrail for monitoring and auditability
Database design & SQL (relational modeling, performance-oriented querying)
Hands-on experience building production-grade data pipelines and ingestion systems (batch + real-time)
Strong understanding of data modeling, schema evolution, and transformation patterns
Practical experience working in hybrid environments (on-prem + cloud)
Ability to build reliable systems: idempotency, retries, DLQs, backpressure, monitoring, and runbooks
Strong communication skills and the ability to collaborate with both technical and business stakeholders
AWS IAM (least privilege, roles/policies, secure access patterns)
Amazon EKS (deploying/operating data services in Kubernetes)
Experience migrating legacy pipelines to MS SQL Server or modernizing RDBMS-backed workloads
Experience with governance, lineage, and audit requirements in regulated environments
AWS certifications (e.g., Data Analytics Specialty / Solutions Architect)

Data Engineer — AWS Glue / Streaming / Python

Key skills

About this role

Responsibilities:

Requirements: