Whether you are into data science or not, in the realm of Generative AI, you often encounter vital points such as pre-training, finetuning, and RAG. After working with fellow developers and seeing various people's thoughts on generative AI, more precisely LLM, I decided to write down my idea on how to know if you need a pre-training, finetuning, or RAG-based solution for your business needs.
Let's start with a hypothetical scenario. You are an AI developer. Your manager asked you to build an LLM-powered, customer-facing chatbot to meet organizational needs. This chatbot should understand input and respond to user queries. Ultimately, your management wants to reduce human interaction with customer inquiries and assign most of the work to this chatbot.
All right, this is a clear and direct requirement. All we need to provide is our solution to achieve this. Let's break down the question into some simpler blocks.
Note: For the sake of simplicity, all other software development-related processes are excluded
Okay. Based on the above two points, it should be chat-optimized and domain-adapted. With that, our goal has now been narrowed down to the question,
For clarification purposes, I will walk you through the workaround we might need for each of the three options we are interested in. Let's start with pre-training.
In the LLM (or any Gen AI) dialect, pre-training refers to training a vast neural network withtrillions of data from diverse resourceswith self-supervised learning. This process starts with a model with random weights. Eventually, it learns underlying text patterns and updates parameters accordingly. After the pre-training stage, the resulting large language model is rich in understanding language and linguistic patterns. Pretty cool, right? Not yet; the resulting model can only generate the next word (token) in an autoregressive manner; in other words, it's a foundational model. Therefore, our job is not done yet, as we need to finetune, more specifically, instruct-tune the resulting model to follow user intentions.
Here is an example from early versions of the GPT-3 model. Here, you can see the GPT-3 (foundational model) response and Instruct-GPT (their instruct-tuned version) response to the prompt "Why aren't birds real?"
As you can see, the instruct-tuned version is more aligned with user intention.
However, building a foundational model, as these models contain billions of parameters, comes with a massive computational cost. On top of that, we have to finetune the model further to follow user instructions. Due to these cost constraints, only a few companies owned large language models, at least as of the time this article was written. Considering our use case, chatbot development, you will realize that per-training is nowhere near the ideal solution.
Perfect!! Now, let's consider finetuning.
Finetuning of LLM can start from either the foundational or the instruct-tuned model. In both cases, the resulting model inherits capabilities from the base model. In LLM terminology, finetuning refers to adapting a pre-trained large language model to a specific domain or task. In contrast, the prerequisites of finetuning are having pre-trained LLM and question-and-answer pairs. Let's recall our previous topic, finetuning. There, we started with random weights. But in this stage, we are beginning with the pre-trained model. In other words, we have to make minor tweaks to the base model using our domain-specific dataset. Hence, it is called finetuning. Depending on how we update parameters in the base model, there are a few finetuning variants.
Let's switch back to our original question: is this suitable for our chatbot development?
I think… it is.
However, there are a few problems. First, we need a high-quality domain-specific dataset. Second, we might need to train a large language model periodically as our domain expands, say if we include a few products in our catalog. Third, we might encounter a common hallucination problem. If our dataset is not large enough or its coverage is low, the model will struggle around those edge cases and provide answers based on the knowledge grabbed from the pre-training stage. In your business, most of those responses will be nonsense. So, can we mitigate these problems? Of course, yes. We have RAG for our rescue.
Retrieval Augmented generation, or RAG, is a mechanism for connecting a specific knowledge base with a pre-trained/finetuned LLM. There are no more weight updates to the base model and no fancy deep-learning terminologies. You pluck an LLM suited for your use case and integrate it with the knowledge base you have; that's all. You have a fully customized chatbot. Since LLM uses context to answer questions most of the time, we can expect low hallucination here compared to the finetuning version.
I hope you now have a clear answer to our question. It's RAG, the most suited solution for our chatbot development. It can be easily connected with our external resources, hence more domain-specific, and it also uses LLM capabilities.
Nevertheless, in some cases, RAG is not the ideal solution. What if the user asks, "What is the current stock availability of the 65-inch Samsung QLED TV?" In this situation, first, we need to understand that we need to perform a database search to find the answer to the user's question, and second, we need to craft a proper answer mentioning our result. So, it sounds like RAG, but it's an improved or smarter version. Yes, we have a solution for that as well. It's called Agentic RAG. It uses tools and functions, and LLM can access these functions depending on the question it receives. By the way, that topic deserves a whole new article due to its prominence.
Let's summarize what we discussed so far in one diagram.
In this article, I aimed to provide better guidance on pre-training, finetuning, and RAG topics. Let me know your thoughts on this. Thanks for reading!!
All images, unless otherwise noted, are by the author.