ProCogia is a data consulting company that helps businesses transform data into real growth, especially in high-stakes industries. They are seeking an LLM Research Intern to assess client-specific data, fine-tune large language models, and manage distributed training workflows while supporting the development of innovative data solutions.
Responsibilities:
- Assess client-specific data assets and determine the appropriate adaptation strategy — continued pretraining, supervised fine-tuning, or a combination — based on the domain, data volume, and use case requirements
- Curate, clean, structure, and prepare domain-specific datasets from raw client data for use in model training pipelines
- Fine-tune large language models in the 70B–100B+ parameter range using techniques such as LoRA, QLoRA, and multi-adapter patterns
- Perform continued pretraining on open-weight models (Qwen, Llama, and related ecosystems) to embed domain knowledge directly into model weights
- Manage distributed training workflows across multi-node GPU clusters
- Design and execute evaluation frameworks to validate domain adaptation quality, factual grounding, and model behavior
- Support RAG system development where applicable, including vector database integration, chunking strategies, and reranking pipelines
- Contribute to inference optimization and deployment pipeline integration