Deloitte is leading an AI-first initiative aimed at transforming the healthcare decision-making process through advanced modeling and reasoning systems. As a Research Engineer, you will design, train, and evaluate models that enhance clinical and operational decision-making, focusing on post-training methodologies and ensuring model behavior aligns with healthcare standards.
Responsibilities:
- Design and execute post-training pipelines: supervised fine-tuning (SFT), preference optimization, and reinforcement learning / alignment workflows
- Build and optimize training using techniques such as SFT, RLHF, PPO, DPO, GRPO, RLAIF, and Constitutional AI, and understand how each affects reasoning quality, safety, latency, cost, and reliability
- Train reasoning models for healthcare decisioning using verifiable-reward RL - designing reward signals and verifiers grounded in clinical guidelines, policy and criteria, and adjudicated outcomes
- Develop reward models and preference datasets to improve reasoning quality, factuality, safety, policy adherence, and task performance
- Curate, clean, synthesize, and evaluate large-scale instruction, preference, and domain-specific datasets, with rigorous filtering, deduplication, and quality control
- Build verification and reward pipelines from our proprietary clinical, claims, and operational data and from clinical-expert labeling - turning guidelines, policy, and adjudicated outcomes into checkable reward signals at scale
- Implement efficient fine-tuning strategies including LoRA, QLoRA, PEFT, and adapter-based approaches; build scalable distributed training using DeepSpeed, FSDP, Megatron-LM, Ray, or equivalent
- Optimize inference performance - latency, throughput, quantization, and deployment efficiency - for production, including frameworks such as vLLM, TensorRT-LLM, or TGI
- Train and optimize open-weight models such as Llama, Qwen, Mistral, or DeepSeek; build specialized small language models (SLMs) for on-premise and cloud-hybrid deployment with strong performance-per-dollar
- Design evaluation frameworks covering reasoning, hallucination detection, factuality, instruction following, structured outputs, and domain-specific metrics
- Build healthcare-grade evaluation - held-out clinical benchmarks, deployment regression gates, calibration and uncertainty, factuality against ground truth, and bias/fairness evaluation across patient populations and subgroups - co-designed with clinical experts
- Apply PHI/HIPAA-aware data handling and produce model documentation suitable for regulated clinical use
- Perform red teaming and adversarial testing to identify alignment failures, unsafe behaviors, jailbreak vulnerabilities, and regression risks; collaborate with agentic and application teams to improve tool use, grounding, and long-horizon reasoning
Requirements:
- Bachelor's degree in Computer Science, Machine Learning, Artificial Intelligence, Applied Mathematics, Computational Linguistics, or a related field
- Demonstrated depth training and post-training large transformer-based language models in production or research - this is your craft, not coursework or a one-off fine-tune. Genuine depth including SFT and at least one preference-optimization or RL method, evidenced by shipped models, releases, or research
- Hands-on experience with reasoning-model training and/or verifiable-reward (RLVR) workflows
- Strong understanding of modern post-training techniques: SFT, RLHF, PPO, DPO, GRPO, RLAIF, and preference optimization workflows
- Experience with open-weight foundation models such as Llama, Qwen, Mistral, DeepSeek, or equivalent architectures
- Strong expertise in PyTorch and modern deep-learning tooling; experience with distributed training frameworks such as DeepSpeed, FSDP, Megatron-LM, or Ray
- Experience implementing efficient fine-tuning techniques such as LoRA, QLoRA, PEFT, and quantization-aware workflows
- Deep understanding of transformer architectures, tokenization, attention mechanisms, decoding strategies, and model scaling trade-offs
- Strong grasp of LLM evaluation methodologies, benchmarking, reward modeling, and alignment trade-offs; experience with large-scale and synthetic datasets, filtering, deduplication, and quality-control pipelines
- Strong Python engineering skills and production-grade software practices; ability to work through ambiguous, highly complex technical problems in fast-moving environments
- Ability to travel 0-50%, on average, based on the work you do and the clients and industries/sectors you serve
- Limited immigration sponsorship may be available
- Experience building or optimizing reasoning models, agentic models, or tool-using LLM systems
- Familiarity with inference optimization frameworks such as vLLM, TensorRT-LLM, TGI, or Ollama
- Experience with multimodal models, speech models, or domain-specific foundation models; experience using large-scale GPU clusters and distributed compute
- Contributions to open-source AI projects, research publications, benchmark development, or model releases
- Familiarity with safety, governance, and responsible-AI practices; experience in regulated or high-stakes industries such as healthcare, finance, insurance, or public sector