What is AI? What is an LLM and generative AI, and how do they work?

WHAT IS AI? SOME KEY DEFINITIONS

"Artificial intelligence leverages computers and machines to mimic the problem-solving and decision-making capabilities of the human mind” 

IBM1

Artificial intelligence (AI) is an umbrella term for computer software which mimics human intelligence in order to perform tasks and learn from them. 

Machine learning (ML) is a subset of AI. Whereas computer programmes were traditionally programmed via detailed code and instructions, ML enables such programmes to take in a set of data, derive principles from that set of data, and subsequently apply those principles.

Deep learning is a type of ML which utilises neural networks. A neural network is a series of algorithms which “learn” from large amounts of data to recognise relationships within data. In a deep learning model, multiple layers of such networks are used, and are structured to mimic the connections between neurons in a human brain.

Once trained, an algorithm can take in new data and produce outputs based on its training. AI algorithms can be split into three main categories: supervised learning, unsupervised learning, and reinforcement learning.

  • Supervised learning algorithms take labelled data as input and predict outcomes on new data. The “label” provides the correct output for a given piece of data (for example, if the data comprises pictures of cats, the label may be “cat”). 
  • Unsupervised learning algorithms take unlabelled data and can be used to spot patterns / provide insight into data.
    • Semi-supervised learning combines the two previous concepts to get the benefits of both, often using a larger amount of unlabelled data to reduce the burden of sourcing a large amount of labelled data.
  • Reinforcement learning (RL) algorithms learn through feedback from its actions: instead of being fed correct answers from a supervisor, the RL learns from mistakes and aims to maximise a reward function.

WHAT IS GENERATIVE AI? HOW DO LLMs FIT IN?

Generative AI is a subset of AI capable of creating new content in the form of text, audio, imagery, and more. It has exploded in the last year or so via ChatGPT, Stable Diffusion, Dall-E, and many other examples.

Current generative AI utilises large language models (LLMs) in relation to text, and similar models in relation to images, music, etc. LLMs focus on the processing and comprehension of human language.

Common examples of NLP use cases (many of which may not be considered to be “AI”) are predictive text, machine language translation, and smart assistants like Siri and Alexa. More advanced recent implementations include ChatGPT. 

HOW DO THEY WORK?

LLMs utilise deep learning techniques (neural networks) to understand, summarise, generate, and predict new text content2. At a very high level, LLMs take in large amounts of training text, and process it to extract key principles from each piece of training data. When asked to provide an output (for example, when a user asks ChatGPT a question), the LLM relies on the key principles derived from the dataset to provide that output. In ChatGPT’s case for example, the LLM effectively does nothing more than predict the next letter in the sentence, without any concept of the meaning of the words which it is outputting.

More specifically, LLMs use a transformer-based architecture comprising a “self-attention mechanism” which provides context by placing different weights, relating to relevance, on different parts of the input data3. This mimics the “cognitive attention” observed in the human mind. 

The exact architecture chosen will depend on the task. For example, Google’s BERT uses a bidirectional transformer model, enabling the consideration of words either side of the “missing” word (the word that is to be predicted), making BERT well-suited to NLU (Natural Language Understanding) tasks. The GPT models use an autoregressive transformer model, considering only the words that precede the missing word. This architecture is therefore much better suited to language generation tasks3.

An LLM is typically trained using a large corpus of text data (such as the “Common Crawl” open repository, which maintains petabytes of plain text crawl data). One training mechanism used is the self-supervised approach, meaning that the correct answer is given (“supervised”), and the correct answer lies in the data itself (i.e. the next word in the sentence)4. Another training mechanism is Reinforcement Learning from Human Feedback (RLHF), a type of RL which uses a reward function that is produced by training a reward model from human feedback5.

This architecture and training gives LLM-powered chatbots such as ChatGPT the ability to relate words and phrases together and predict what should come next. Importantly, ChatGPT has no creative ability or free thought surrounding the information it relays to a user; it is simply very well trained in understanding the relationships between words and phrases and can generate an appropriate string of words based on the data it has been trained on. This means that its outputs can be inaccurate, lacking in nuance, or biased, meaning that reliance on such models for reliable information is risky.

The release of GPT-4, with its ability to handle image inputs and produce more reliable outputs6, indicates a rapid improvement in generative AI models, which could fuel a wider adoption of them in industry in the coming years.

Interested in learning more about AI innovation? See related articles in our new AI Hub. Have a potential AI innovation of your own? Explore what is protectable, and how, in our next article: “What rights might subsist in AI itself? How can you protect those rights?”

Potter Clarkson’s specialist electronics and communications team includes a number of attorneys with extensive experience in software, and AI inventions. If we can help you with an issue relating to the protection and commercialisation of innovation in any area Artificial Intelligence, please get in touch.

SOURCES / FURTHER READING

  1. What is Artificial Intelligence (AI) ? | IBM
  2. What is a large language model (LLM)? – TechTarget Definition
  3. What Is the Transformer Architecture and How Does It Work? (datagen.tech)
  4. Different ways of training LLMs. And why prompting is none of them | by Dorian Drost | Jul, 2023 | Towards Data Science
  5. Introducing ChatGPT (openai.com)
  6. GPT-4 (openai.com)