NTT DATA is a leading business and technology services provider, committed to accelerating client success through responsible innovation. They are seeking an AI Data Engineer - Security with strong Kafka experience to design and operate large-scale event-streaming platforms, manage data ingestion, and optimize AWS Glue jobs.

Responsibilities:

Kafka-Strong expertise in Kafka (4-5 years), with hands-on experience designing and operating large-scale, highly available event-streaming platforms, including partitioning strategies, consumer group optimization, schema management, and performance tuning
API-first data ingestion. Strong hands-on pulling data from REST/GraphQL APIs with auth (OAuth2, API keys), pagination, rate limits, retries/backoff, and webhooks; strong Python skills to normalize/enrich data and land it cleanly into S3 (schema, partitioning, Parquet)
AWS data lake, end to end. Comfortable building/operating S3-based lakes with layered zones (raw → harmonized → conformed → modeled), Glue Data Catalog, IAM/Secrets Manager, VPC endpoints, encryption, lifecycle/versioning, and cost/perf best practices (file sizing, compaction)
AWS Glue + PySpark expert. Designs and optimizes Glue jobs using PySpark/DynamicFrames, bookmarks for incremental loads, dependency packaging, robust error handling, logging/metrics, and unit tests; knows how to tune jobs for scale and cost
Airflow orchestration. Writes clean, parameterized, idempotent DAGs (sensors, SLAs, retries, alerts), manages dependencies across pipelines, and uses Git-based CI/CD to promote changes safely
Snowflake proficiency. Builds ELT models (staging/ODS/marts), tunes performance (warehouse sizing, clustering, micro-partitions, caching), uses Streams/Tasks/Snowpipe for CDC, and follows solid RBAC and data governance practices

Requirements:

Strong expertise in Kafka (4-5 years), with hands-on experience designing and operating large-scale, highly available event-streaming platforms, including partitioning strategies, consumer group optimization, schema management, and performance tuning
Strong hands-on pulling data from REST/GraphQL APIs with auth (OAuth2, API keys), pagination, rate limits, retries/backoff, and webhooks
Strong Python skills to normalize/enrich data and land it cleanly into S3 (schema, partitioning, Parquet)
Comfortable building/operating S3-based lakes with layered zones (raw → harmonized → conformed → modeled), Glue Data Catalog, IAM/Secrets Manager, VPC endpoints, encryption, lifecycle/versioning, and cost/perf best practices (file sizing, compaction)
Designs and optimizes Glue jobs using PySpark/DynamicFrames, bookmarks for incremental loads, dependency packaging, robust error handling, logging/metrics, and unit tests
Knows how to tune jobs for scale and cost
Writes clean, parameterized, idempotent DAGs (sensors, SLAs, retries, alerts), manages dependencies across pipelines, and uses Git-based CI/CD to promote changes safely
Builds ELT models (staging/ODS/marts), tunes performance (warehouse sizing, clustering, micro-partitions, caching), uses Streams/Tasks/Snowpipe for CDC, and follows solid RBAC and data governance practices

AI Data Engineer - Security (Kafka Experience)

Key skills

About this role

Responsibilities:

Requirements: