Prolific is building the biggest pool of quality human data in the world, and they are seeking AI and Machine Learning Engineers to join their Expert Network. In this role, you will help train and evaluate the next generation of LLMs using deep technical expertise, ensuring the accuracy and effectiveness of AI models.
Responsibilities:
- Evaluate LLM Architecture Logic: review AI-generated explanations of model architectures, loss functions, and backpropagation for technical accuracy
- Audit Code & Notebooks: validate ML-specific code (e.g., training loops, data preprocessing scripts, or model evaluations) for efficiency and correctness
- Refine RLHF Frameworks: provide the high-quality human feedback necessary to align models with human intent, safety, and helpfulness
- Analyze Model Reasoning: critically assess how an AI model navigates complex chain-of-thought (CoT) prompts and identify where the reasoning breaks down
- Benchmark Performance: conduct comparative testing between different model outputs based on specific technical taxonomies and performance metrics
Requirements:
- a BS, MS, or PhD in Computer Science, Artificial Intelligence, Robotics, or a related quantitative field with a focus on Machine Learning
- experience building, deploying, or fine-tuning ML models in a production environment
- professional-level understanding of neural network architectures (Transformers, CNNs, RNNs) and optimization techniques
- hands-on experience with Prompt Engineering, RLHF (Reinforcement Learning from Human Feedback), or RAG (Retrieval-Augmented Generation) workflows
- the ability to audit complex model logic, identify training data contamination, and evaluate mathematical proofs behind ML algorithms
- high attention to detail in spotting 'hallucinations,' biased outputs, or logical failures in AI-generated technical content
- expert proficiency in PyTorch or TensorFlow/Keras
- advanced Python (NumPy, Pandas, Scikit-learn) and experience with Hugging Face Transformers
- experience with AWS (SageMaker), Google Cloud (Vertex AI), or specialized tools like Weights & Biases and LangChain
- familiarity with Pinecone, Milvus, or Weaviate for RAG evaluation