HCLTech is seeking a Data Engineer to design, build, and operate scalable, event-driven data pipelines across AWS and hybrid environments. The role focuses on modernizing data architecture and transitioning legacy pipelines to MS SQL Server while ensuring reliability and performance of data ingestion and synchronization services.
Responsibilities:
- Build services for data ingestion and synchronization with source systems, ensuring near-real-time updates and reliable change propagation
- Ingest data from multiple sources using Python and ETL tooling (including AWS-native services)
- Design and implement event-driven / streaming architectures using AWS EventBridge, Kafka, or SNS/SQS for real-time data movement and processing
- Design, implement, and maintain scalable data pipelines across on‑prem + AWS cloud environments
- Develop efficient Python / PySpark applications to process large datasets (leveraging libraries such as pandas and NumPy )
- Work with NoSQL databases (e.g., DynamoDB, MongoDB, Cassandra ) to support high-performance storage and retrieval patterns
- Build and deploy components in a cloud-native architecture , ensuring scalability, resiliency, and secure-by-design implementation
- Implement strong monitoring and operational excellence practices: CloudWatch , logging, alerting, tracing, and proactive troubleshooting
- Transition existing pipelines to MS SQL Server and ensure performance, governance, and data quality are maintained or improved
- Collaborate closely with business/application owners to understand:
- Current ingestion & pipeline architecture
- Business logic and transformation patterns
- Data consumption and analytics needs
- Design and document the target-state data architecture , including pipelines, processing patterns, and analytics enablement
- Identify opportunities to optimize, consolidate, and simplify pipelines, logic, and infrastructure
- Partner with the data team to decompose business logic into scalable transformation patterns and reusable components
Requirements:
- AWS Glue (ETL pipeline development, orchestration, job tuning)
- Streaming / Messaging: Kafka and/or AWS SNS/SQS (event-driven data flows)
- Python (strong) and PySpark
- Data Lake concepts and implementation (e.g., S3-based lake patterns)
- Observability: CloudWatch and CloudTrail for monitoring and auditability
- Database design & SQL (relational modeling, performance-oriented querying)
- Hands-on experience building production-grade data pipelines and ingestion systems (batch + real-time)
- Strong understanding of data modeling, schema evolution, and transformation patterns
- Practical experience working in hybrid environments (on-prem + cloud)
- Ability to build reliable systems: idempotency, retries, DLQs, backpressure, monitoring, and runbooks
- Strong communication skills and the ability to collaborate with both technical and business stakeholders
- AWS IAM (least privilege, roles/policies, secure access patterns)
- Amazon EKS (deploying/operating data services in Kubernetes)
- Experience migrating legacy pipelines to MS SQL Server or modernizing RDBMS-backed workloads
- Experience with governance, lineage, and audit requirements in regulated environments
- AWS certifications (e.g., Data Analytics Specialty / Solutions Architect)