Large Language Models (LLMs) are the industry’s closest friends. They are our best friends, even if they are seen as disruptors, making the scene volatile. They are a learner’s paradise. With the risk of oversimplifying, in my view, LLMs can be used in three ways: 1) LLMs as foundation models, 2) Fine-tuned LLMs, and 3) LLMs with RAGs.

Way 1: LLMs as foundation models

Let’s decode the terminology. LLMs are “large”. Their inherent capability is in “language”. They are a machine learning “model”. LLMs are supposed to cater to a wide variety of language requirements and tasks. 

Originally, they were not supposed to be domain or task-specific. You ask them to explain the recipe of onion pakoda, and they will do so. You ask them to explain why the sky is blue, and they will oblige. You instruct them to define a computer, and they will define obediently. Hence, large. 

You don’t have to follow a strict syntax to interact with them. You can explain what you want them to do in your own natural language. LLMs would interpret the requirements correctly. They will also generate the response in a user-friendly, readable language. Hence, language.

LLMs are a multi-class classification machine learning model. The possible values of the target variable are the set of distinct tokens in a language. That is called the vocabulary. Hence, model.

Using a foundation model as such leverages the above three aspects of a large language model.

Way 2: Fine-tuned LLMs

However, we quickly realised that such a generic model might not be useful in all scenarios and for all kinds of requirements. We started expecting LLMs to be specialists. We wanted one LLM to be a healthcare specialist, another to be a finance specialist, and yet another to be an expert in marketing. To achieve that, we started fine-tuning a foundation LLM with additional training records from the specific field we wanted the LLM to be an expert in. We fine-tuned the same foundation LLM in a supervised manner (called SFT) using healthcare data, finance knowledge, and marketing data, respectively, to make it an expert in healthcare, finance, and marketing.

In the process, we agreed to modify certain upper layers of the neural network and partially change the model’s parameters (weights). While doing this, one had to take care not to destroy or diminish the LLM’s inherent language interpretation and generation capabilities. Otherwise, it will no longer remain a good language model. It will just be a finance expert who doesn’t know how to communicate!

Way 3: LLMs with RAG

Fine-tuning requires time and money. It requires careful preparation of the training dataset from the domain. It risks negatively affecting the model’s language capabilities. Enter Retrieval Augmented Generation (RAG). RAG is a local or external knowledge source from which the system extracts the context for interacting with the LLM. It is economically beneficial as it does not drain money and time. It retains the foundation model’s language capabilities because the RAG approach neither re-trains nor modifies any of the model’s parameters. 

Conclusion

In practice, most real-world solutions don’t rely on just one of these three ways in isolation. Foundation models give you a strong starting brain, fine-tuning helps that brain speak the language of a specific domain, and RAG keeps it grounded in the latest, most relevant knowledge. The real art lies in choosing the right mix for your problem, budgets, and constraints. Starting simple with a foundation model and RAG, then fine-tuning when you need extra accuracy or a distinctive voice, can sometimes be the best approach for your organisation’s use case.



Linkedin


Disclaimer

Views expressed above are the author’s own.



END OF ARTICLE





Source link