Design end-to-end NLP pipelines: entity extraction, terminological normalization, semantic matching and clustering;
Conduct exploratory phases (EDA, data quality assessment, completeness, analytical feasibility) on structured and unstructured datasets;
Define modeling approaches balancing fine-tuning of Transformer models (BERTimbau and similar) and the use of LLMs for extraction/structuring, with clear criteria for reproducibility, cost and auditability;
Build embedding pipelines, RAG and semantic search using vector databases (Qdrant, Milvus, ChromaDB);
Calibrate prioritization scores and anomaly detection (Isolation Forest, Autoencoders, HDBSCAN) in collaboration with domain experts;
Version experiments and models ensuring traceability and governance;
Produce high-level technical and scientific documentation (reports and, when applicable, papers);
Act as the technical interlocutor with domain experts to validate criteria, thresholds and metrics.

Degree in Data Science, Statistics, Computer Science or a related field;
5+ years working on NLP projects in production, preferably in Portuguese;
Strong proficiency in Python, pandas, scikit-learn and PyTorch (Transformers);
Hands-on experience with Transformer models (BERTimbau, multilingual BERT);
Applied Generative AI: prompt engineering, RAG, structured outputs, embeddings and tool use;
Experience with Hugging Face transformers, spaCy and sentence-transformers;
Experience with vector databases (Qdrant, Milvus or ChromaDB) and similarity search;
Solid knowledge of CRISP-DM methodology and fundamentals of MLOps (MLflow);
Ability to communicate technical results to both technical and non-technical audiences;
Experience serving open-source LLMs (vLLM, Ollama, TGI, llama.cpp) in on-premise GPU environments;
Knowledge of GPU orchestration on Kubernetes (GPU pass-through, MIG, NVIDIA GPU Operator);
Scientific publications in NLP, ML or applied data science;
Experience with free-text datasets with low standardization and typical natural language data quality challenges.

Senior AI Specialist / Data Scientist

Key skills