The Know is a venture-backed, early-stage company helping corporate executives make confident decisions in moments of uncertainty. The role involves owning the text intelligence stack, improving NLP systems, and translating customer needs into system improvements while mentoring junior teammates.
Responsibilities:
- Own our text intelligence stack end-to-end: improve and scale topic classification, complex opinion extraction, and emotion detection
- Build modern NLP/LLM systems in production: tokenization to embeddings/vectorization to retrieval to classification/generation, with rigorous evaluation and monitoring
- RAG + vector space reasoning: design retrieval strategies, embedding/index choices, chunking/metadata schemes, and confidence and explainability methods for customer-facing outputs
- Emerging topic discovery: clustering + dimensionality reduction + labeling workflows to identify new topics on the fly and consolidate them into useful meta-topics
- Performance + cost discipline: optimize throughput, latency, and cloud spend while maintaining integrity
- Data storytelling: turn messy, high-volume text streams into clear narratives and visualizations customers can trust
- Partner on product + customer needs: translate customer goals into measurable system improvements and ship features from idea to production to iteration
- Mentor junior teammates: code reviews, light technical leadership, raising the bar without slowing shipping
Requirements:
- Strong applied NLP background, including:
- Text preprocessing/tokenization
- Embeddings/vectorization, transformer models
- Clustering + topic modeling approaches, dimensionality reduction
- Feature engineering, model selection, tuning, evaluation design
- Production mindset: shipped ML/NLP systems that run reliably, with monitoring, drift, and failure modes in mind
- Comfort building in real codebases: Python is a must; familiarity with JavaScript or other backend languages is highly preferred
- Data + systems fundamentals: database schema design and querying, efficient data pipelines, Git, basic Unix tooling
- Communication: explain methods, limitations, and confidence clearly to non-technical stakeholders
- Experience with AWS (serverless architecture, hosted ML models, non-relational databases)
- Experience with OpenAI / Claude / Gemini APIs, model routing
- Hugging Face ecosystem fluency (Transformers, Datasets, fine-tuning/inference)
- Some frontend familiarity (React) or strong data visualization chops (Plotly, D3)
- Experience in domains like news intelligence, social listening, elections/policy monitoring, trust/safety, public opinion, or risk analytics