Convoy Health is an early-stage, venture-backed healthcare AI company focused on transforming complex healthcare data into actionable intelligence. As a Data Scientist, you will own the ML models and statistical systems that drive the platform's intelligence layer, working across various domains to build models that enhance decision-making and operational efficiency.
Responsibilities:
- Build forecasting models for financial planning: revenue projections by payer/contract/service line, cost trend forecasting, TCOP (total cost of patient) projections, budget variance prediction, and capitation rate modeling
- Design and maintain benchmarking algorithms: peer cohort construction from multi-dimensional provider/org attributes, percentile distribution computation, CMS national benchmark integration, and outlier detection for performance management
- Build visit optimization models: slot utilization analysis, no-show prediction, revenue-per-visit optimization, scheduling pattern analysis, and capacity planning
- Develop VBC performance scoring: quality measure forecasting, shared savings projections, risk adjustment optimization (CMS-HCC), attribution modeling, and contract performance simulation
- Build and maintain detection models covering DOFR routing misdirection, duplicate detection (LSH), CC/MCC/SOI severity inflation (Isolation Forest + XGBoost), procedure code integrity (NCCI rules + XGBoost), discharge/transfer violations (PACT rules + SQL lookback), and contract rate misapplication (range checks + CUSUM drift)
- Calibrate dollar-weighted work queues (P1/P2/P3 tiers) to surface high-value flags, optimizing for precision over recall
- Design feedback loops: confirm/clear workflows with structured reason codes, dollar impact tracking, monthly model retraining, weekly signature updates, and quarterly threshold recalibration
- Implement multi-tier population surveillance systems: descriptive statistics, CUSUM control charts, peer group z-scores feeding heightened scrutiny lists, and time-series forecasting with anomaly detection
- Monitor model and population drift via statistical process control, triggering retraining when sustained shifts are detected
- Build anomaly detection across multiple data domains — not just claims, but also revenue cycle metrics, scheduling patterns, GL variances, and operational KPIs
- Build SHAP-based explainability for all ML models and work with the AI agent team to synthesize plain-language decision narratives — "what happened, why, what it costs, what to do"
- Maintain vector-embedded error signature libraries for semantic matching of known patterns against new data
- Design model outputs specifically for agentic consumption — autonomous AI workflows call your models via tool use, so output schemas, confidence calibration, and actionability scoring directly impact agent quality
- Contribute to decision intelligence prompts: help design the reasoning chains that translate model outputs into recommended actions
- Use AI-assisted development tools daily for model prototyping, feature engineering, code generation, and analysis
- Understand agentic AI patterns — your models are consumed by autonomous agents, not just displayed on dashboards. You'll design for tool-use invocation, multi-step reasoning chains, and human-in-the-loop review workflows
- Collaborate with the engineering team on agent action groups, ensuring model outputs are structured for agentic consumption
Requirements:
- 3+ years of applied ML experience, with production model deployment (not just notebooks)
- Strong experience with gradient boosting (XGBoost/LightGBM) — training, hyperparameter tuning, feature engineering, SHAP interpretation
- Experience with anomaly detection: Isolation Forest, CUSUM/statistical process control, or similar unsupervised methods
- Experience with time-series forecasting (Prophet, ARIMA, or similar) for financial or operational projections
- Proficiency in Python (scikit-learn, XGBoost, pandas, numpy) and SQL for feature engineering against large datasets
- Experience with model evaluation in precision-critical settings — understanding precision/recall tradeoffs, threshold calibration, capacity-constrained optimization
- Familiarity with vector embeddings and similarity search (sentence-transformers, FAISS, pgvector, or similar)
- Comfort with AI-assisted development — you use AI tools daily for prototyping, analysis, and code generation
- Understanding of agentic AI patterns — you know how autonomous agents consume model outputs and can design for that consumption pattern
- Ability to communicate model behavior to non-technical stakeholders — you'll work closely with finance and operations leaders
- Experience deploying ML models consumed by autonomous AI agents in production, or building production RAG/LLM systems
- Healthcare domain experience: claims data, DRG groupers (MS-DRG, APR-DRG), NCCI edits, CMS payment policies, ICD-10, CPT/HCPCS coding, VBC contracts, risk adjustment
- Experience with multiple healthcare data types beyond claims: EMR/clinical data, revenue cycle metrics, financial/GL data, scheduling/operational data
- Experience building ML systems consumed by AI agents or autonomous workflows (tool-use patterns, structured output design, confidence calibration for agentic consumption)
- Experience with LSH / MinHash for near-duplicate detection
- Familiarity with model registries (MLflow or similar)
- Experience with cloud-based ML pipeline orchestration (SageMaker, Step Functions, Vertex AI, or similar)
- Exposure to HIPAA/BAA compliance requirements for ML systems handling PHI
- Knowledge of CMS PACT transfer policy, NCCI bundling rules, or physician fee schedule RVU calculations
- Founding DS / first-DS-hire experience at a VC-backed startup, or Member of Technical Staff at an AI lab
- Prior experience at a healthcare AI / data company (Tempus AI, Flatiron, Komodo, Verily, Lightbeam, Lyra, Holmusk, etc.)