Judi Health is an enterprise health technology company providing a comprehensive suite of solutions for employers and health plans. They are seeking a Senior Scalability Engineer focused on caching infrastructure and performance optimization to own the architecture and improvement of caching solutions across their platform.
Responsibilities:
- Own caching infrastructure: Design, implement, and maintain caching architecture using Valkey/Redis (ElastiCache) for high-throughput healthcare applications processing millions of transactions per day
- Build shared libraries: Develop and evolve caching libraries and patterns used across multiple engineering teams, establishing best practices for cache key design, invalidation strategies, and performance monitoring
- Partner with engineering teams: Work directly with product teams to design and implement caching solutions tailored to their specific use cases, providing technical guidance and hands-on support during implementation
- Drive performance optimization: Conduct deep performance analysis using profiling tools to identify bottlenecks beyond caching—database queries, application code, infrastructure—and deliver measurable improvements
- Establish performance standards: Define performance benchmarks, implement monitoring and alerting, and help teams measure the impact of optimizations through data-driven analysis
- Contribute to observability: Enhance observability infrastructure (LGTM stack) to track cache hit rates, latency patterns, and system performance metrics across the platform
- Demonstrate technical leadership: Mentor engineers on performance best practices, lead architecture reviews, and represent the Scalability team in cross-functional planning discussions
- Responsible for adherence to the Capital Rx Code of Conduct, including reporting of noncompliance
Requirements:
- 10+ years of software engineering experience with demonstrated progression into technical leadership roles
- 3+ years of experience leading technical initiatives, mentoring engineers, or serving as a subject matter expert on complex systems
- Strong expertise in Python (Flask/SQLAlchemy) for production applications
- Deep PostgreSQL knowledge: Advanced query optimization, indexing strategies, triggered, stored procedures, plan analysis, and experience with replication and clustering
- Production caching experience: Proven track record designing and implementing caching strategies at scale using Redis, Valkey, Memcached, or similar technologies
- Performance optimization expertise: Demonstrated ability to profile applications, identify bottlenecks, and deliver measurable performance improvements (latency reduction, throughput gains, cost savings)
- AWS experience: Production experience with Aurora RDS, Lambda, ElastiCache, EC2, ECS, and S3
- Systems thinking: Ability to analyze performance across the full stack — application, database, caching, infrastructure — and make architectural tradeoffs
- Collaboration and communication: Strong written and verbal communication skills with ability to work autonomously while driving proactive collaboration in a remote environment
- Rust development experience or strong interest in learning Rust for high-performance systems
- Infrastructure as code: Experience with Terraform or similar IaC tools for managing cloud infrastructure
- Observability tools: Hands-on experience with Grafana, Prometheus, Loki, or similar monitoring/alerting platforms
- High-throughput systems: Background in building systems that handle millions of requests per day with strict SLA requirements
- Previous Pharmacy Benefits Manager (PBM) or healthcare technology experience