Senior SDET – Performance Engineering
Location
Pune
Experience
10+ Years
Role Overview
We are looking for a Senior SDET specializing in Performance Engineering in a cloud-native
Azure environment. This role focuses on driving scalability, reliability, and performance
validation across distributed microservices systems. The candidate will design automated
performance frameworks, build simulators and mocks, define KPIs, and partner with
engineering, architecture, and SRE teams to ensure production-grade resilience.
Key Responsibilities
• Design and implement end-to-end performance and load testing strategies for
microservices-based systems
• Build custom simulators, traffic generators, and mocks for complex system dependencies
• Define, measure, and track performance KPIs (latency, throughput, error rate, saturation,
scalability limits)
• Develop fully automated performance test frameworks integrated into CI/CD pipelines
(Jenkins, GitHub Actions, GitLab)
• Execute load, stress, spike, endurance, and chaos testing
• Collaborate with architects, developers, product owners, and SRE teams to optimize
system performance
• Analyze bottlenecks across application, database, and infrastructure layers
• Work closely with Azure services (AKS, compute, storage, networking) for performance
tuning
• Implement observability using Prometheus, Grafana, and APM tools
• Optimize Redis caching, database queries (MariaDB, MySQL, etc), and messaging systems
• Support resilience engineering and chaos testing (Chaos Monkey or equivalent)
• Drive RCA for performance issues and production incidents
• Contribute to capacity planning and scalability strategy
Required Skills
• Strong experience in performance testing tools (K6, JMeter, Gatling, and creating custom
frameworks)
• Proficiency in scripting (Python, C#, Java, or similar)
• Deep understanding of distributed systems and microservices architecture
• Hands-on experience with Kubernetes (AKS preferred)
• Strong knowledge of Azure cloud ecosystem
• Experience with CI/CD and DevOps practices
• Understanding of SRE principles (SLI/SLO, error budgets)
• Experience with observability and monitoring tools
• Strong database performance tuning expertise
Preferred Skills
• Experience in contact center / SaaS platforms
• Exposure to Kafka, RabbitMQ
• Knowledge of AIOps, AI-driven testing and anomaly detection
• Experience building custom performance tools or simulators
Qualifications
• Bachelor’s/Master’s in Computer Science or related field
• 10+ years experience in QA, development and automation, with strong focus on
performance engineering
Tech Stack
• Cloud: Azure (AKS, Compute, Storage)
• DevOps: Docker, Kubernetes, Terraform, CI/CD
• Monitoring: Prometheus, Grafana
• Databases: PostgreSQL, MongoDB, Redis
• Backend: Java, Spring Boot, APIs
• Messaging: Kafka / RabbitMQ
AI-Driven Performance Engineering (GenAI & AIOps)
• Leverage Generative AI (GenAI) to auto-generate performance test scenarios, workloads,
and synthetic datasets
• Implement AI-driven anomaly detection for identifying performance regressions and
system bottlenecks
• Use machine learning models for predictive capacity planning and workload forecasting
• Integrate AIOps tools for intelligent alerting, noise reduction, and automated root cause
analysis (RCA)
• Apply AI techniques for log analysis, pattern recognition, and failure prediction
• Build self-healing test systems with automated remediation triggers
• Enhance observability platforms (Prometheus, Grafana) with AI-based insights
• Utilize AI for dynamic test optimization based on real-time system behavior
• Collaborate with data science teams to implement advanced analytics for performance
insights
AI & Observability Tooling (Real-World Examples)
• Azure Monitor + Application Insights (with AI capabilities): Smart detection, failure
anomaly detection, and auto-root cause insights
• Azure OpenAI / GenAI integrations: Generate performance scenarios, synthetic workloads,
and intelligent test data
• Dynatrace (Davis AI): Automatic dependency mapping, causal AI for root cause analysis,
and real-time anomaly detection
• Datadog AI / Watchdog: Automated anomaly detection, performance regression
identification, and alert correlation
• New Relic AI: Predictive alerting and performance intelligence across distributed systems
• Prometheus + Grafana (with ML plugins): Advanced metric analysis and anomaly detection
extensions
• Elastic Stack (ELK) with ML: Log anomaly detection, pattern recognition, and predictive
insights
• Chaos Engineering tools (Gremlin, Chaos Monkey): Integrated with observability platforms
for resilience validation
• k6 + AI-based extensions: Intelligent load modeling and performance insights
• Custom AI/ML pipelines: Python-based models for predictive scaling, workload modeling,
and anomaly detection