
Role: AWS Site Reliability Engineer
Location: (Atlanta, Charlotte, Raleigh)
Hybrid
Exp: 10+ years experience
Description:
Strong software engineering experience with cloud-native application development
Hands-on experience building solutions on AWS
Experience with event-driven architecture and asynchronous processing
Strong programming skills in Python, Java, or TypeScript
Experience designing APIs, microservices, and scalable backend systems
Understanding of reliability engineering principles, including:
Resilience patterns
Failover strategies
Error budgets
SLOs and SLIs
Disaster recovery
Incident management
Operational readiness
Ability to define and assess non-functional requirements
Experience building data-driven scoring, maturity, or assessment frameworks
Familiarity with observability platforms and telemetry data
Experience integrating with enterprise systems such as ServiceNow, Jira, Splunk, Datadog, or PagerDuty
Understanding of AI-enabled automation, agentic workflows, or LLM-based assistants
Strong analytical, problem-solving, and communication skills