About this role

ProCogia is a data consulting company that helps businesses transform data into real growth, especially in high-stakes industries. They are seeking an LLM Research Intern to assess client-specific data, fine-tune large language models, and manage distributed training workflows while supporting the development of innovative data solutions.

Responsibilities:

Assess client-specific data assets and determine the appropriate adaptation strategy — continued pretraining, supervised fine-tuning, or a combination — based on the domain, data volume, and use case requirements
Curate, clean, structure, and prepare domain-specific datasets from raw client data for use in model training pipelines
Fine-tune large language models in the 70B–100B+ parameter range using techniques such as LoRA, QLoRA, and multi-adapter patterns
Perform continued pretraining on open-weight models (Qwen, Llama, and related ecosystems) to embed domain knowledge directly into model weights
Manage distributed training workflows across multi-node GPU clusters
Design and execute evaluation frameworks to validate domain adaptation quality, factual grounding, and model behavior
Support RAG system development where applicable, including vector database integration, chunking strategies, and reranking pipelines
Contribute to inference optimization and deployment pipeline integration

LLM Research Intern

Key skills

About this role

Responsibilities: