Empower is focused on transforming financial lives by providing a flexible work environment and opportunities for career growth. They are seeking an API Reliability Engineer to build and operate reliable, scalable API services while troubleshooting complex issues and improving system resilience.

Responsibilities:

Own and improve the reliability, performance, and scalability of API services in production
Troubleshoot and resolve P1/P2 production incidents end-to-end, analyzing issues across application, infrastructure, and integrations
Work closely with API developers to identify and address reliability issues and application-level security vulnerabilities in service design and implementation
Contribute targeted code-level or configuration fixes to resolve issues and prevent recurrence
Participate in root cause analysis (RCA) and drive durable, long-term fixes
Improve API resilience through patterns such as timeouts, retries, circuit breakers, and graceful degradation
Establish and enhance observability and service health metrics, including logs, metrics, traces, and SLOs, using Datadog and Splunk
Define and monitor SLAs/SLOs for API performance and availability
Work with API Gateway and ALB/NLB for traffic management, routing, and system reliability
Contribute to CI/CD pipelines using Jenkins to ensure safe and consistent deployments
Contribute to disaster recovery readiness and system resilience planning
Collaborate across engineering teams to improve system design and operational readiness
Participate in an on-call rotation for critical incidents (P1/P2)

Requirements:

Minimum 5 years of experience in backend or API development
Strong hands-on experience with Java and Spring Boot
Proven experience building, shipping, and operating APIs in production environments
Strong problem-solving skills with the ability to debug real production issues end-to-end
Experience handling P1/P2 incidents in production environments
Solid understanding of API architecture, request lifecycle, and common failure patterns
Experience with AWS services, including API Gateway, ALB/NLB, EC2, ECS/EKS, Lambda, RDS, or DynamoDB
Familiarity with reliability patterns such as timeouts, retries, circuit breakers, and connection pooling
Experience with observability tools such as Datadog and/or Splunk
Experience with CI/CD pipelines, preferably Jenkins
Strong debugging skills in distributed systems
Experience with Git-based workflows and Agile development
Bachelor's in Computer Science, Information Systems, or a related field; equivalent practical experience welcomed
AWS certifications such as Solutions Architect or Developer Associate
Experience with microservices and distributed system design
Exposure to SLAs/SLOs and service health metrics
Experience with Docker and Kubernetes
Familiarity with API gateways, traffic routing, and load balancing strategies
Experience in performance tuning and scalability improvements
Strong communication skills during high-severity incidents

API Reliability Engineer

Key skills

About this role

Responsibilities:

Requirements: