LLMs Explained

April 9, 2025

Large Language Models (LLMs) have revolutionized the realm of artificial intelligence (AI), capable of understanding and generating human language with remarkable proficiency. A product of deep learning techniques applied to extensive data sets, LLMs operate on the principles of language modeling and transformer models. At Posos, we are leveraging their power to move forward with multi-lingual posology structuration and to obtain a global drug database. This guide will delve into how LLMs work, their impact on AI technology, and, most crucially, how Posos R&D utilizes them to create innovative technologies.

Summary

This article is part of a series explaining the ongoing work of Posos' Research and Development department. If you are interested in this series, you may find an article on how POSOS is using AI to improve the use of medicines by healthcare professionals and patients, and an article on how we are tackling automatic data annotations for posology structuration is currently in preparation.

What is the difference between LLM and AI?

Defining AI and its Relation with LLM

AI is a broad field that encompasses a variety of subfields, including LLMs. AI, as a first approximation, refers to the development of computer algorithms that can perform tasks requiring human intelligence. It is the ability of a machine to mimic human intelligence or even outperform it on a given task. These tasks could range from pattern recognition and decision-making to language understanding and data generation.

LLMs are a particular kind of AI for language that leverages machine learning algorithms to understand and generate human language. They are capable of a variety of tasks such as text prediction, language translation, and even chatbot functionality.

How LLMs Enhance AI Capabilities

LLMs augment AI capabilities by leveraging their vast training on massive amounts of text data to generate coherent and contextually relevant text. Their most essential feature is their capacity for generalization. This predictive capability allows them to generate text that mirrors human language. For example, if it learns to complete many sentences of the kind “The capital of France is…” with the right capital city it might learn to produce the suitable city when prompted with the right instruction i.e. answer “Madrid” to “What is the capital of Spain ?”.

LLMs also enable AI systems to handle complex tasks such as content generation, text personalization, and translation. They empower AI to draft legal briefs, analyze customer reviews, and even translate documents into different languages.

Moreover, LLMs like GPT-4o [1], with over 175 billion parameters, can perform tasks with surprising accuracy, even exceeding their predecessors in unexpected ways. For example in the translation domain especially for idiomatic expression pairing or creating concise and coherent text summaries. This ability of LLMs to adapt and evolve as they grow augments the capabilities of AI systems, making them more efficient and versatile in handling language-related tasks.

Understanding Large Language Models

One key component of LLMs is the transformer architecture [2], which enables them to handle long-range dependencies in text. Transformers rely on a mechanism called self-attention, allowing the model to weigh the importance of words in a sentence when making a prediction.

Despite their impressive capabilities, LLMs do not "understand" language in the way humans do. But it is alright!

As stated by Patrick Winston, one of AI founding fathers « There are lots of ways of being smart that aren't smart like us ».

‍

LLMs excel at pattern recognition and can generate plausible responses, but they lack an innate grasp of meaning or context beyond what they have been trained on.

What is a large language model?

Language models are models that can predict the likelihood of a word in a given sentence. A good language model should determine that a noun and more specifically a proper noun is the most likely word after the following sentence “The capital of France is”. An even better language model, which has processed similar sentences about this information in its training data, should compute that “Paris” is a more probable word than any other city name. LLMs, on the other hand, have a significantly larger number of parameters than smaller language models, enabling them to capture more complex and nuanced patterns in language.

Language models rely on two types of architecture: encoder and decoder, which serve different purposes.

The encoder-only approach computes the probability of each word from all the surrounding words and is primarily used for language understanding tasks. They are not however suited for generating coherent and contextually relevant text.
Decoder-only models, on the other hand, compute the probability of each word based only on words before them, and thus performs well for text generation tasks. As LLM are made first and foremost for language generation, most of them rely on decoder-only architecture.

The LLM architecture alone is not enough to perform text generation. Indeed, an LLM only produces the likelihood of a next word providing the beginning of a text. Thus a decoding strategy must be employed, to iteratively choose which word will complete the given sentence. The simplest strategy is greedy search, where the most probable word is chosen at each step. But each new chosen word changes the probabilities for the following ones. Thus greedy search might miss some more interesting combinations of words. Other strategies rely on beam search or sampling, where several candidate sequences are kept in memory while decoding.

The size of the model, in terms of parameters, influences the ability of LLMs. Today, LLMs typically have billions or even hundreds of billions of parameters, whereas BERT [3], the first transformer-based language model from 5 years ago has “only” 340 million. LLMs also rely on a greater amount of training data. While BERT was trained in 2018 on roughly 3 billion words mainly from Wikipedia in English, Facebook’s most recent LLM LLaMa 3 [4] was trained last year on 2 trillion tokens, from many sources and in several languages.

How does a Large Language Model Work?

How to train your LLM

A relatively new (~2018) idea in AI was to divide training into two steps: an unsupervised pre-training followed by supervised fine-tuning. During pre-training, the AI will feed on a very large dataset to create a representation of language. This model has no interest in itself, it serves more as a basis for what comes next. It is like a stem cell that will be specialized to handle a precise task in the future. This specialization is called fine-tuning: supervised learning, but as the model has already learned during pre-training much fewer high-quality annotated examples are required. The same pre-trained model can be fine-tuned to solve numerous problems with only a few high-quality annotated examples. LLM go even further: in few cases for generation problems fine-tuning is not needed! This is called zero-shot learning as LLMs have the ability to generate a coherent output without any prior knowledge of the subject due to their vast generalization capability. However, some fine-tuning can still be required for specific LLM-related tasks.

Understanding the Working Mechanisms of LLMs

How does a LLM work? In a nutshell, its pre-training consists of making the AI, given a sentence, predict the following word. It works like a parrot who would repeat sentences. But due to the extraordinary amount of sentences it works on, this exercise makes the AI learn some vast knowledge. To keep the same example, if it is trained to complete "The French capital is …" by "Paris", chances are for it to learn that London is the capital of the United Kingdom.

Once trained, a generative AI can remarkably handle any text-related task, if it is formulated as a generation problem. Of course, some troubles of prompt engineering might arise at some point: a proper formulation will yield better results. Any reader with a little experience in ChatGPT might have already encountered such problems.

Factors Influencing the Performance of LLMs

Performances of LLMs are influenced by numerous factors, but first and foremost the quality of training data which has to be as diverse and comprehensive of the language the LLM is aiming to model as possible. It thus has to be unbiased, representative of the context, and often quite voluminous. Inner settings also play a role that should not be overlooked: indeed both the choice of the fine-tuning tasks and the model have a significant impact on performances. Larger models, with more parameters, tend to perform better as they can capture more complex patterns.

These factors, among others, shape the performance of LLMs, underlining the importance of careful model design, diverse data collection, and thoughtful model interaction.

Improving the efficiency of LLMs

As we look into the future, enhancing the performances and explicability - i.e. providing a better understanding of their inner workings - of LLMs is a primary area of focus. One limitation is that most LLMs are trained only on textual data, which might prevent a more complete understanding of the world that images and sounds could provide. Multimodal models, trained on data of different types (e.g. texts and images), have been developed to solve this issue and can generate more use cases, allowing us to include images in instructions to LLMs or even to produce them. The use of mixtures of multiple LLMs, or Mixture-of-Agents, has also been explored as a way to leverage the collective strengths of multiple models, further enhancing AI capabilities.

Researchers are exploring innovative strategies to improve both computational and memory efficiency without compromising model performance. Computational efficiency is key, especially as LLMs continue to grow in size. Strategies such as optimizing computation time and throughput of inference tasks are under investigation.

Memory efficiency is another critical concern. Techniques like minimizing memory usage during fine-tuning and quantizing models into smaller data types help manage the growing number of parameters in LLMs.

Finally, LLMs can sometimes generate false information, making it difficult to detect errors automatically. Therefore, LLMs should be used alongside more traditional approaches to ensure optimal performance. At Posos, our goal is to leverage their predictive power to create a global drug database and to expand our products to other languages.

‍

[1] : https://arxiv.org/abs/2303.08774

[2] : https://arxiv.org/abs/1706.03762

[3] : https://arxiv.org/abs/1810.04805

[4] : https://ai.meta.com/blog/meta-llama-3/

‍