Judi Health is an enterprise health technology company providing a comprehensive suite of solutions for employers and health plans. The Senior Scalability Engineer will focus on streaming and realtime systems, owning the architecture and expansion of streaming data infrastructure, and ensuring reliable data processing while collaborating with cross-functional teams.
Responsibilities:
- Own streaming infrastructure: Design, implement, and expand WAL-based replication systems that process database changes through Kinesis to Snowflake and Redshift, handling millions of records while maintaining strict ordering and delivery guarantees
- Build CDC systems: Architect and implement change data capture infrastructure for cross-platform data synchronization, enabling realtime analytics and event-driven workflows across the organization
- Develop shared libraries: Create reusable Kinesis/SNS consumer patterns and libraries used across multiple teams, establishing best practices for event processing, error handling, and observability
- Partner with product teams: Work directly with teams to design and implement realtime data processing solutions tailored to their business needs, providing technical guidance and hands-on support
- Ensure data reliability: Implement exactly-once processing semantics, dead letter queues, retry strategies, and monitoring to guarantee data integrity across streaming pipelines
- Build observability: Develop monitoring, alerting, and dashboards for streaming pipelines to track throughput, lag, data quality issues, and system health using the LGTM stack
- Demonstrate technical leadership: Mentor engineers on streaming architecture patterns, lead design reviews for event-driven systems, and represent the Scalability team in cross-functional planning
- Make strong architectural choices through careful evaluation and prior experience with distributed systems
- Responsible for adherence to the Capital Rx Code of Conduct, including reporting of noncompliance
Requirements:
- 10+ years of software engineering experience with demonstrated progression into technical leadership roles
- 3+ years of experience leading technical initiatives, architecting distributed systems, or serving as a subject matter expert on streaming infrastructure
- Strong expertise in Python (Flask/SQLAlchemy) for production applications
- Deep PostgreSQL knowledge: Understanding of write-ahead logs, replication, logical decoding, and change data capture mechanisms
- Production streaming experience: Proven track record building and operating high-throughput streaming systems using Kinesis, Kafka, or similar event streaming platforms
- Distributed systems expertise: Strong understanding of ordering guarantees, exactly-once semantics, partition strategies, backpressure handling, and fault tolerance patterns
- AWS experience: Production experience with Kinesis, S3, SNS/SQS, Lambda, ECS, and data pipeline orchestration
- Data warehouse knowledge: Experience loading data into Redshift, Snowflake, or similar analytical databases
- Systems thinking: Ability to design resilient, observable streaming architectures that balance throughput, latency, and reliability
- Collaboration and communication: Strong written and verbal communication skills with ability to work autonomously while driving proactive collaboration in a remote environment
- Rust development experience or strong interest in learning Rust for high-performance systems
- Infrastructure as code: Experience with Terraform or similar IaC tools for managing cloud infrastructure
- Observability tools: Hands-on experience with Grafana, Prometheus, Loki, or similar monitoring/alerting platforms
- Event-driven architectures: Background designing event sourcing, CQRS, or other event-driven patterns
- Previous Pharmacy Benefits Manager (PBM) or healthcare technology experience