Quantiphi is an award-winning, AI-First global digital engineering company that helps Fortune 1000 organizations transform bold ideas into measurable business impact. They are seeking a highly skilled Architect - Platform Engineer to design, optimize, and scale infrastructure for GenAI and LLM workloads, collaborating closely with data science and application teams to bring AI solutions to life.
Responsibilities:
- Design and implement scalable infrastructure for LLM and GenAI workloads across multi-GPU environments
- Perform GPU profiling, benchmarking, and performance optimization for distributed training workloads
- Manage and schedule compute-intensive jobs using Slurm-based clusters and OpenShift/Kubernetes environments
- Enable and optimize the NVIDIA GPU stack (CUDA, cuDNN, NCCL, Triton, RAPIDS, etc.)
- Collaborate with cross-functional teams to deploy models in research and production environments
- Build and support GenAI pipelines (fine-tuning, RAG, multi-modal inferencing, LLMOps)
- Develop reusable infrastructure templates using tools like Terraform and Helm
- Contribute to internal innovation (PoCs, workshops) and support client-facing delivery engagements
Requirements:
- Strong experience with Slurm and distributed training environments
- Hands-on expertise with Red Hat OpenShift and/or Kubernetes
- Deep knowledge of the NVIDIA GPU ecosystem (CUDA, cuDNN, NCCL, Nsight, Triton/TensorRT)
- Strong foundation in Linux systems, performance tuning, and multi-GPU optimization
- Experience deploying GenAI workloads (LLM fine-tuning, RAG pipelines, multi-modal systems)
- Familiarity with Infrastructure-as-Code tools (Terraform, Ansible)
- Experience with cloud GPU environments (GCP, Azure, AWS, OCI) and/or on-prem GPU clusters
- Experience with NVIDIA NIMs, DGX systems, or GPU-accelerated containers
- Knowledge of LLMOps frameworks and MLOps integration
- Familiarity with vector databases and retrieval systems for RAG architectures
- Comfortable working in client-facing environments and collaborating with AI solution teams
- Experience working with FHIR R4, HL7 v2, or SMART on FHIR
- Integration with EHR systems (e.g., Epic)
- Understanding of HIPAA compliance and healthcare data privacy
- Exposure to clinical workflows, CDS Hooks, or patient-facing applications
- Experience building clinical decision support systems or healthcare interoperability solutions