Key artificial intelligence concepts Key artificial intelligence concepts

Key artificial intelligence concepts

Aaron Kirk Aaron Kirk

Current Artificial Intelligence technologies are built on years of technological innovation. With so many layers of engineering, it is possible to spend countless hours learning the concepts behind each one. Instead, we have identifies the eight most relevant concepts in understanding current LLM technologies.

 

Tokenization

Large Language Models are trained on a vast corpus of text data. Words are broken into what are known as tokens. Tokens are each given a unique identification number and embedded in a dictionary. When a model is prompted, prompts are broken into tokens which are processed through a series of algorithms which reference the dictionary to ascertain context and produce a token output based on probability and the relationship between the tokens in the dictionary. Output tokens are then converted back into text format before being displayed to the user. Expenses are determined by the quantity of tokens used as this is a measure of the computational cost of the prompt.

 

Context Length

Models vary in their capacity to receive text input. Context length depends on multiple factors such as architecture, memory and hardware. Prompting models with too much text will result in poor quality output as the model loses the ability to maintain context during computation. Providers are continuously improving their products, context length has significantly increased with the release of each GPT model.

 

Year

Model

Context Length (tokens)

2019

GPT-2

1,024

2020

GPT-3

2,049

2022

GPT-3.5

4,096

2023

GPT-4

8,192

2023

GPT-4-32K

32,768

2024

GPT-4 Turbo

128,000

 

Inference

Large Language Models undergo a two stage process, the first is the training stage where models are trained on a vast corpus of text data. The second stage is usage, known as inference. Inference refers to the operation performed be the LLM after being prompted. Information attained from the training stage is used to make probability based calculations to develop an output.

Encoding/Decoding

The process of turning text data into numeric format ready to be calculated. Output values are then converted back into text format.

 

Hallucination

Large Language Models use the concept of neural networks to illicit a response to a prompt. This is based on the architecture of the human brain. Humans are creative beings and rarely answer questions the same way twice. To ensure that LLMs can simulate a human response the probability metrics are tuned to retain an element of creativity. These probabilities allow the model, on occasion, to output words or tokens which diverge from what would be considered fact. The model is also limited to the information stored within the training data, this may limit its ability to give accurate answers to particular questions. Models are most likely to hallucinate when asked for obscure or uncommon information, although they can hallucinate at any time.

 

Model Parameters

Parameters determine how the model processes data and generates an output. During the training stage parameters are adjusted and set to minimise the error between predicted and generated outputs. Other parameters determine how text is broken and stored by the model, this is essential in allowing the model to understand the relationship between tokens. Models with more parameters can often interoperate more complex and nuanced linguistic patterns. Although too many parameters can result in diminishing returns as improvements plateaus whilst computational costs increase.

Fine-tuning

The practice of further training an existing model on task specific data with the intention of changing the models behavior. This results in the dilution of the original training data, strengthening the relationships between the relevant tokens in the dictionary, thus increasing the probability of a particular output. Harvey AI does this by using law documentation to further train GPT4. When newer models are released fine-tuned models can become outdated.

 

Prompt Engineering

Large Language Models are powerful tools but most users do not know how to get the most value from them. Prompt engineers are AI users who know how to structure text to get the best output from the model. There are techniques available on the internet which can assist users in engineering a good prompt, they mostly focus on order and specificity, remaining objective is important when designing an effective prompt.

Was this article helpful?

1 out of 1 found this helpful

Add comment

Please sign in to leave a comment.