Design, implement, and optimize document intelligence pipelines that leverage LLMs to extract, interpret, and structure information from unstructured and semi-structured documents (PDFs, images, forms, contracts, etc.)
Leverage existing foundational models and adapt them to fit into various product requirements, ensuring alignment with business goals.
Collaborate with product managers, data scientists, and software engineers to integrate LLM-based automation into scalable solutions.
Create and architect Interpreters, Agented Systems Architect and implement document-centric RAG systems, enabling accurate retrieval and reasoning over large corpora of documents
Develop systems for key-value extraction, table extraction, and entity recognition from complex documents (e.g., invoices, financial statements, contracts)
Research and evaluate new technologies and methodologies in the LLM space to continuously improve product automation.
Work on the customization and fine-tuning of models to optimize performance for specific use cases.
Develop, test, and deploy LLM-based services in production environments.
Provide technical leadership and mentorship to other engineers on the team.
Ensure that LLM integrations are efficient, scalable, and secure, adhering to industry best practices.
Requirements
5+ years of experience with document processing systems, including OCR and text extraction tools (e.g., Tesseract, AWS Textract, Azure Form Recognizer, Google Document AI)
Experience working with unstructured and semi-structured data (PDFs, scanned images, forms)
Proven experience working with foundational models in production environments.
Strong programming skills in Python with experience in relevant ML libraries.
Hands-on experience in deploying machine learning models at scale.
Excellent communication and collaboration skills, with the ability to work cross-functionally.
Problem-solving mindset, with the flexibility to adapt models and workflows to evolving product needs.