Judi Health is an enterprise health technology company providing a comprehensive suite of solutions for employers and health plans. They are seeking a Senior Scalability Engineer focused on streaming and realtime systems to own the architecture and expansion of their streaming data infrastructure while collaborating closely with clients and cross-functional teams.
Responsibilities:
- Own streaming infrastructure: Design, implement, and expand WAL-based replication systems that process database changes through Kinesis to Snowflake and Redshift, handling millions of records while maintaining strict ordering and delivery guarantees
- Build CDC systems: Architect and implement change data capture infrastructure for cross-platform data synchronization, enabling realtime analytics and event-driven workflows across the organization
- Develop shared libraries: Create reusable Kinesis/SNS consumer patterns and libraries used across multiple teams, establishing best practices for event processing, error handling, and observability
- Partner with product teams: Work directly with teams to design and implement realtime data processing solutions tailored to their business needs, providing technical guidance and hands-on support
- Ensure data reliability: Implement exactly-once processing semantics, dead letter queues, retry strategies, and monitoring to guarantee data integrity across streaming pipelines
- Build observability: Develop monitoring, alerting, and dashboards for streaming pipelines to track throughput, lag, data quality issues, and system health using the LGTM stack
- Demonstrate technical leadership: Mentor engineers on streaming architecture patterns, lead design reviews for event-driven systems, and represent the Scalability team in cross-functional planning
- Make strong architectural choices through careful evaluation and prior experience with distributed systems
- Responsible for adherence to the Capital Rx Code of Conduct, including reporting of noncompliance
Requirements:
- 10+ years of software engineering experience with demonstrated progression into technical leadership roles
- 3+ years of experience leading technical initiatives, architecting distributed systems, or serving as a subject matter expert on streaming infrastructure
- Strong expertise in Python (Flask/SQLAlchemy) for production applications
- Deep PostgreSQL knowledge: Understanding of write-ahead logs, replication, logical decoding, and change data capture mechanisms
- Production streaming experience: Proven track record building and operating high-throughput streaming systems using Kinesis, Kafka, or similar event streaming platforms
- Distributed systems expertise: Strong understanding of ordering guarantees, exactly-once semantics, partition strategies, backpressure handling, and fault tolerance patterns
- AWS experience: Production experience with Kinesis, S3, SNS/SQS, Lambda, ECS, and data pipeline orchestration
- Data warehouse knowledge: Experience loading data into Redshift, Snowflake, or similar analytical databases
- Systems thinking: Ability to design resilient, observable streaming architectures that balance throughput, latency, and reliability
- Collaboration and communication: Strong written and verbal communication skills with ability to work autonomously while driving proactive collaboration in a remote environment
- Rust development experience or strong interest in learning Rust for high-performance systems
- Infrastructure as code: Experience with Terraform or similar IaC tools for managing cloud infrastructure
- Observability tools: Hands-on experience with Grafana, Prometheus, Loki, or similar monitoring/alerting platforms
- Event-driven architectures: Background designing event sourcing, CQRS, or other event-driven patterns
- Previous Pharmacy Benefits Manager (PBM) or healthcare technology experience