The Annual International Conference on Innovations in Infobusiness & Technology (ICIIT), organised by the Informatics Institute of Technology, took place on May 30-31, 2024, rebranded as ICIIT Conclave 2024. The theme of the conclave was “Large Language Models and Generative AI.”
The conference covered a wide array of topics, including:
– Advanced prompt engineering techniques for LLMs
– Utilising Retrieval-Augmented Generation (RAG) to enrich LLM outputs
– Generating datasets with LLMs
– Developing and applying multimodal LLMs
– Enhancing reasoning and decision-making through AI methodologies
– Implementing LLMs across various domains
– Exploring transparency, explainability, and ethical considerations in LLM applications
– Scalability and maintenance challenges in LLM deployment
– Future directions and emerging trends in LLM technology
The Conclave featured distinguished keynote speakers who provided deep insights into the evolving landscape of artificial intelligence and its applications. These included Dr. Romesh Ranawana, Chairman of the National Committee to Formulate a Strategy for AI and Group Chief Analytics & AI Officer at Dialog, Prof. Nirmalie Wiratunga, and Dr. Stewart Massie from the Artificial Intelligence and Reasoning Group at Robert Gordon University. Dr. Ranawana delivered a speech titled “Generative AI: The Next Frontier in Creativity and Innovation,” highlighting the transformative potential of generative AI. Prof. Wiratunga and Dr. Massie presented “Enhancing Generative AI with Contextual Intelligence: The Role of Case-Based Reasoning in LLMs,” exploring the integration of contextual intelligence with large language models.
Representing LIRNEasia, junior researcher Chanuka Algama engaged with the academic research community to share insights and collaborate, in line with LIRNEasia’s commitment to advancing research within the academic framework in the field of large language models and AI. After a thorough review process by an esteemed panel, Chanuka’s abstract titled “Conversational RAG with Memory-Based Context Enhancement” was selected for presentation at the ICIIT Conclave 2024. The presentation addressed the limitations of RAG in conversational settings and proposed a novel solution.
The research demonstrated superior performance over the inbuilt LlamaIndex query engine, contributing to the advancement of conversational AI technology. Retrieval-Augmented Generation (RAG) in Conversational AI is an advanced technique in natural language processing that enhances the capabilities of Large Language Models (LLMs). RAG works by integrating a retrieval mechanism into the LLM’s text generation process, allowing the model to fetch relevant information from a large database of documents to produce more informed and contextually appropriate answers.
While RAG has shown significant success in tasks requiring external knowledge and contextual understanding, it faces challenges in conversational settings, especially with follow-up queries where capturing context and understanding complex questions is crucial. The challenge lies in ensuring that the retrieved information accurately reflects the user’s intent, as the proximity of text chunks in the embedding space does not guarantee a meaningful question-and-answer pair.
Key points of the research included:
– Limitations in RAG when applied to follow-up queries in a conversational setting, particularly in capturing context and understanding complex queries.
– Development of a plug-and-play conversational querying architecture.
– Three-phase pipeline: indexing, retrieval & generation, and conversation history management.
– Use of Mistral-7B-Instruct-v0.1 as a base model.
– Performance of the pipeline.
In this work, using Mistral-7B-Instruct-v0.1 as a base model, specifically its 4-bit quantised version, a plug-and-play conversational querying architecture was presented. The pipeline is divided into three primary phases: indexing, retrieval & generation, and saving conversation history into memory & context enhancement. Initially, documents are segmented into text chunks, and their embeddings are stored in a vector database, facilitated by the LlamaIndex framework. Subsequently, user queries are matched against these embeddings, prompting the LLM to generate responses based on the retrieved contexts. Finally, leveraging conversation memory with LLM temperature 0.0 to generate standalone questions for follow-up queries. Evaluation of the architecture against a ground truth question-answer pair dataset revealed superior performance over the inbuilt LlamaIndex query engine.
A detailed article with more information on this research will be published in the next few weeks. Stay tuned!
Comments are closed.