What is artificial intelligence and how does it work What is artificial intelligence and how does it work

What is artificial intelligence and how does it work

Aaron Kirk Aaron Kirk

Artificial Intelligence is the simulation of human intelligence by machines. Historically the capacity to recognise patterns, learn, and comprehend language has been limited to the human species. However, advancements in technology beginning in the 1950s have seen the rise in the capability of machines to perform these very functions. Artificial intelligence is currently being applied in various forms throughout a range of industries such as robotics, vehicle manufacturing, healthcare and finance. This article will focus on the development of large language models, which have seen significant advancements since the introduction of the first Generative Pre-trained Transformer (GPT-1) by Open AI in 2018.

A GPT is a software model which is built on the concept of a neural network, inspired by the way in which the human brain processes information. GTPs are designed to receive information in text format and produce a human like response. They are a type of Large Language Model which use deep learning techniques to perform natural language processing and are the foundation of the Jylo platform.

 

Large language models are initially exposed to a vast quantity of text data during the training stage. Text is broken into 'tokens' which are mapped in a multidimensional space. This allows the relationship between tokens to be calculated. The size and quality of the dataset determines the models ability to spot complex patterns and nuanced relationships between tokens. The model references the training data to allocated the relevant numeric values to the input tokens before passing the values through the neural network, after which parameters known as weights and biases are tweaked to reduce error between output and predicted output, this is known as Backpropagation. Once the model has been suitably refined, it is ready to undergo inference.

 

Inference is the use stage where users provide prompts to the model in search of a response. The parameters at this point are in a fixed state and cannot be altered by the user. The prompt input is converted to tokens. Token values are adjusted based on their relationship to one and other along with their position inside the prompt input, this allows the model to know what tokens to give the most attention. The new values are processed through the network, the output values represent the probability of the next word appearing in the output sequence. Each iteration of the output sequence is passed through the network to calculate the probability of the following token until the network returns a value that results in the end of the sequence.

It is impossible to describe the functionality of a large language model without referring to mathematics, this article should give users enough understanding to use the tool without getting bogged down in numbers and equations.

 

Was this article helpful?

0 out of 0 found this helpful

Add comment

Please sign in to leave a comment.