Close panel

Close panel

Close panel

Close panel

LLMs (large language models): What are they, and how do they work?

Generative artificial intelligence tools are evolving rapidly: in just two months, ChatGPT reached 100 million users. Other platforms such as TikTok took at least nine months. To grasp the potential of these applications, we first need to understand the large language models that drive them.

Los LLM (modelos de lenguaje): qué son y cómo funcionan

What do the letters 'GPT' mean in the name of ChatGPT, the generative AI tool? This stands for 'Generative Pre-trained Transformer.' 'Transformer' makes reference to the type of neural network architecture on which it is built, which was first defined in 2017 in the paper 'Attention is all you need.' 'Pre-trained' and 'Generative' denote its nature as a large language model (or LLM), i.e., it is a model that has been pre-trained with a given data set and has the ability to generate information.

But what exactly are LLMs? "LLMs are models pre-trained using a technique called machine learning. Billions of text corpora are analyzed to learn patterns of language, grammar and context," explains Curro Maturana, Global Head of GenAI at BBVA. "The major departure from classical artificial intelligence models is that LLMs are self-supervised. There is no prior data labeling.”

This training enables LLMs to perform language-related tasks, such as translation, content creation, summarizing and conversation, with human-like accuracy and fluency. GPT-4, the LLM currently powering ChatGPT, was further tuned by reinforcement learning from human and AI feedback.

Since the Turing test was first proposed in the 1950s, scientists have explored the idea of humans having conversations with computers without realizing they're talking to a machine. As a result of this vision, the first chatbot in history, Eliza, was created in 1966.

After decades of development, this interaction has been refined so much that platforms like ChatGPT, Bing AI, and Bard can now generate original content in response to a human prompt. This is all made possible by the Transformer architecture that underpins large language models (LLMs).

Los LLM (modelos de lenguaje): qué son y cómo funcionan

How a large language model works

Large language models, as a variety of natural language processing (NLP), function as neural networks that learn from context, the content itself, and the analysis of word sequences.

By operating on a large scale with billions of parameters, large language models (LLMs) enable AI to generate content similar to what a person would create, as seen with ChatGPT. These models are evolving to use different types of input data to produce various outputs, including audio, images, video, and even 3D content, and are therefore within the scope of Generative AI. To understand how they work, it's essential to unpack each term in the ‘LLM’ initialism:

  • Large. The term "large" refers to the millions of parameters and words used to train and feed the model. For instance, Google's BERT, an early example of a large language model (LLM), uses 110 million parameters. By 2023, GPT-3 had expanded this to 175 billion parameters. The exact number of parameters in GPT-3.5 and GPT-4 remains unknown, but some experts estimate that GPT-4 has 600 times the capacity of GPT-3, suggesting it could have around 100 trillion parameters.
  • Language. This term refers to pattern recognition based on human language extracted from web pages, books, online media articles, and other types of documents.
  • Model. This term refers to the probabilistic mathematical model at the core of the large language model. Essentially, an LLM calculates the probability that a word follows a given string of words (a 'prompt'). Using an attention mechanism, it repeatedly checks each new 'token' (word or part of a word), ensuring perfect grammar and correct meaning in any language.

Examples of large language models

Currently, there is a wide range of active and developing LLMs. Some of the most prominent are:

  • GPT-4. Introduced in March 2023, this large language model demonstrates a deep understanding of complex text. It represents the next generation of LLMs with multimodal capabilities (Multimodal Large Language Model or MLLM): in addition to processing text, it can interpret information from other sources, such as images.
  • BERT. "Bidirectional Encoder Representations from Transformers," or "BERT," is a family of LLMs developed by Google. Unlike models that process words in isolation, BERT excels at understanding the meaning of words in context and grasping the relationships between them.
  • PaLM2. This evolution of PaLM ('Pathways Language Model') was trained with more than 500 billion parameters. Also developed by Google, this language model is capable of understanding complicated language sequences such as riddles or idioms.

Additionally, open-source LLMs deserve mention. These publicly accessible models can be used by developers and researchers to improve or modify them. Notable examples include BLOOM, which can generate text in 59 languages, and Llama 2, developed by Meta and Microsoft.

Los LLM (modelos de lenguaje): qué son y cómo funcionan

LLM use-cases

As language models increase in size, so do their capabilities. Broadly speaking, LLM use has expanded in the following fields:

  • Content and product generation. This is one of the avenues many companies are increasingly exploring. Large language models can analyze vast amounts of data to create personalized recommendations and tailored content for each customer.
  • Categorizing and summarizing information. Large language models can be used, for example, for content categorization and summary. This feature is being leveraged by legal departments through, for example, extensive training to limit error rates and the search for appropriate case law for each situation.
  • Translation. Large language models are not only useful for translations between different natural languages, but also between programming languages for organizations that, for example, want to modernize their systems.
  • Chatbots. Large language models enable enterprises to refine chatbot training, enhancing both customer service and team capabilities. For instance, Salesforce has created Einstein Bot, an assistant that automates tasks and boosts team productivity.

Despite the opportunities they offer, large language models also pose challenges that need to be addressed, such as ensuring the quality of training data and mitigating biases in the input data. Another significant challenge is the issue of "hallucinations:" even if the information is well-articulated, it might be fabricated. Nevertheless, the exploration of human-machine communication continues to advance, turning science fiction concepts into reality.