Design, build, and maintain Python-based test automation frameworks, not just individual test cases
Define reusable test libraries for validating data platforms and distributed systems
Drive automation standards, patterns, and best practices across teams
Validate Kafka-based event streams, including:
◦ Topic-level data validation
◦ Producer and consumer behavior
◦ Message schemas, payload integrity, ordering, and replay scenarios
◦ Failure handling, retries, and dead-letter scenarios
Test asynchronous workflows and event propagation across services
Validate end-to-end data flows across distributed services and pipelines
Test backend APIs, service integrations, and asynchronous processing layers
Perform schema validation, transformation checks, data consistency, and completeness validation
Test cloud-native data platforms built on AWS services such as:
◦ S3, Glue, Redshift, Lambda (or similar services)
Validate ingestion, processing, storage, and downstream consumption of data
Debug data and automation failures across multiple cloud services
Embed automation into CI/CD pipelines
Enforce quality gates and fail pipelines on critical data or platform issues
Provide actionable feedback to engineering teams based on automation results
Work closely with data engineers, platform engineers, and architects
Define test strategies for event-driven and distributed data systems
Proactively identify quality risks and gaps in platform design
Requirements
Strong test automation engineering experience using Python
Hands-on Kafka testing experience (real production systems, not theoretical knowledge)
Proven experience testing distributed and event-driven systems
Solid understanding of data validation concepts, including:
◦ Schemas and contracts
◦ Transformations and enrichment
◦ Data consistency, completeness, and accuracy
Experience working in AWS-based data platforms
Ability to debug and troubleshoot issues across multiple services, not just log defects
Engineering mindset with ownership mentality
Nice to Have
Experience with schema registries (Avro / JSON / Protobuf)
Knowledge of streaming vs batch data architectures
Familiarity with observability, logging, and monitoring in distributed systems
Experience working in high-volume, near-real-time data environments