iBusiness is a leading financial technology company transforming the way banks, credit unions, and lenders innovate. They are seeking an experienced AI Knowledge Data Engineer to design, implement, and scale advanced AI systems, focusing on large language models and data pipelines to enhance information retrieval for various business applications.
Responsibilities:
- Architect, implement, and optimize retrieval-augmented generation (RAG) workflows by integrating local LLMs (e.g., Llama) with retrieval mechanisms (vector search, Elasticsearch, FAISS, Weaviate)
- Design, build, and maintain scalable data pipelines for ingesting, transforming, indexing, and retrieving structured and unstructured data from diverse sources
- Design, build, and scale addressable services and tools specifications that can be leveraged by LLMs and Agents to orchestrate workflows
- Orchestrate and scale training data operations, including data curation, versioning, and lineage tracking for large-scale LLM training and fine-tuning
- Develop and maintain ontologies, knowledge graphs, and semantic data models to structure and integrate domain-specific knowledge for improved retrieval and reasoning
- Implement and optimize knowledge retrieval strategies (dense/sparse retrieval, ranking algorithms) to maximize system accuracy and relevance
- Aggregate disparate knowledge bases and heterogeneous data into a fused approach for access to relevant contextual information
- Design cognitive memory systems for AI agents, enabling persistent knowledge retention and contextual awareness across interactions
- Collaborate with AI researchers, data scientists, and engineers to align knowledge architecture with business objectives and ensure data quality
- Evaluate and integrate new technologies and research advancements in LLMs, RAG, information retrieval, and knowledge representation
- Maintain clear and comprehensive documentation of models, pipelines, and workflows
Requirements:
- Bachelor's or Master's degree in Computer Science, Data Science, Machine Learning, or a related field
- Proven experience designing and scaling data pipelines and training data workflows for LLMs or similar AI systems
- Strong background in information retrieval systems, vector search technologies, and RAG frameworks (e.g., FAISS, Pinecone, Elasticsearch, Milvus)
- Proficiency in programming (Python) and machine learning libraries (TensorFlow, PyTorch)
- Experience with ontologies, knowledge graphs, and semantic technologies (RDF, OWL, SPARQL)
- Familiarity with distributed data processing and orchestration tools (e.g., Spark, Airflow, Kubeflow)
- Excellent analytical, problem-solving, and communication skills
- Ability to work collaboratively in a cross-functional, fast-paced environment
- Experience with LLM fine-tuning, prompt engineering, and RAG optimization
- Familiarity with data-centric AI principles and training data quality assessment
- Experience with cloud platforms and scalable storage solutions
- Background in cognitive memory architectures or AI agent design