Bombardier is a leading company in designing, building, and maintaining high-performance aircraft. They are seeking a candidate to lead improvements in retrieval quality and advance OCR and document understanding within their Artificial Intelligence and Machine Learning team.
Responsibilities:
- Lead and ensure applicable corporate implementation of improvements to retrieval quality: strengthen hybrid BM25 + dense retrieval, add robust metadata filtering, and implement/compare rerankers (cross-encoder or lightweight LLM-as-reranker) while iterating on existing BGE pipelines
- Redesign chunking and indexing for PDFs (overlap/hierarchical, section-aware/semantic); build benchmarks to compare strategies; introduce dedup/versioning and maintain document lineage with structured citations
- Expand evaluation beyond Recall@k (e.g., nDCG, MRR, Precision@k) and stand up a continuous evaluation pipeline with meaningful telemetry/logging (OpenTelemetry is a plus)
- Advance OCR & document understanding: use PaddleOCR for scanned PDFs; evaluate advanced LLM-based OCR approaches (local/on‑prem only); expand table/diagram extraction and prepare for multimodal retrieval
- Contribute to a modern frontend: help migrate from Streamlit to React/Next.js + TypeScript (Node.js) with secure sign-in and PDF snippet highlighting
- Strengthen security & platform foundations: implement access control with Azure AD (Entra ID) or LDAP; work comfortably in Azure; uphold Canada-only data residency and no external calls in design and deployment
- Elevate developer experience: drive reviews, testing, and CI/CD workflows with Git Actions