A misconception is currently thriving in the industry that one can become a Generative AI expert without learning “traditional” machine learning.
Large Language Models (LLMs) predict the next word in a sequence of words. They calculate the probability of occurrence of each word in a vocabulary that can follow a sequence of words. In the simplest form, the word with the highest probability makes it to the selection. This generates coherent sentences. For example, if the sequence is “I love,” I am guessing the word with the highest likelihood of following will be “you,” using the transformer’s attention mechanism and feed-forward networks.
I assume we live in a world where people often express their love for each other online in text. It’s a different thing that we might have forgotten to say in person enough times in a day. When we say that Generative AI models have intelligence or can reason, it does not seem so. Now you know why. They predict the next word without understanding the sentence’s meaning. The source of knowledge of LLMs is not grammatical rules. In my opinion, we are still far away from AGI (Artificial General Intelligence). The leap from RNNs (Recurrent Neural Networks) to transformers was achieved by simultaneously providing the complete sequence of words as input and processing them in parallel. Positional information was provided to gain knowledge of the sequence. A grammatically and structurally meaningful sentence needs words in a defined sequence in any language.
Billions of parameters (weights and biases in the network) are learned during training, during which the difference between the predictions and the actual target value decreases. Optimisation is the process of reducing the difference by adjusting the weights and biases in the artificial neural network.
One would argue that since the training dataset contains records with a sequence of words as the input (independent variables) and the immediate next word as the output (target variable), building an LLM is a supervised learning process. However, the process lacks a business objective of labelling. Labelling is assigning the best class from a set of classes for a categorical variable by a domain expert. The labelled records need to be used for supervised model training. If you look at what a generalised LLM does, it takes textual input (for example) and generates content as instructed.
In summary, this is what happens. Step 1) A huge amount of text is fed to the black box. Step 2) Black box builds a model. Step 3) The model generates text. Of course, the black box played it smartly to convert the problem into a supervised learning problem programmatically. However, looking at the three steps mentioned above, it makes more sense to call the process an unsupervised learning process. Let’s not fight over whether it’s a supervised or unsupervised learning process. There is a separate term for it: self-supervised learning.
LLMs are one example of Generative AI. Generative Adversarial Networks (GANs) are another. An LLM is an example of a deep learning model, which is a subset of artificial neural networks. Like they say, what’s in a name? LLMs, Generative AI, deep learning models, artificial neural networks, GANs, supervised learning models, and unsupervised learning models are all various types of “traditional” machine learning models.
I would be highly impressed if someone said they wanted to learn about a dataset through exploration using Python and SQL because they want to learn Generative AI. Generative AI does not exist in isolation and stands on the foundations of data understanding, statistical thinking, and core machine learning principles. Mastering these fundamentals not only demystifies how models work but also empowers practitioners to build, evaluate, and innovate responsibly in this rapidly evolving space. True expertise begins long before the “generation” step. Writing a prompt, submitting it to ChatGPT, and getting a response from it is not learning Generative AI.
Disclaimer
Views expressed above are the author’s own.
END OF ARTICLE
