Traditionally AI software providers have built on one underlying AI model, typically ChatGPT. There are several drawbacks to this methodology. To begin with, the underlying model must be capable of performing the most computationally challenging features of the software, meaning developers must select a large and expensive AI model, even if these features are infrequently used. This architecture is reflected in subscription prices, with users paying for AI capabilities they don't need or know of.
Instead Jylo allows users to remain cost effective by giving them the ability to choose what model is best suited to their use case, charging on a pay for what you use basis. The benefits of this are twofold as certain models outperform others at given tasks. This requires that product creators have an understanding of large language models available on the Jylo platform. Models vary on quality, speed, input/output cost and context length.
- Quality: A combination of the quality and quantity of the models training data, computational resources, architecture and tuning. Larger models can capture more complex linguistic patterns due to the scope of the training data.
- Speed: Determined by the size of the model, smaller models are able to recall information faster as navigating the training data requires less computational power and time.
- Input/output cost: Cost is measured by token usage. Input text is broken into tokens which are processed through a series of algorithms to produce output tokens. These are converted to text before being displayed to the user who is charged based on the quantity of tokens they provided to the model, this includes all prompts and uploaded documentation. AI models vary in token price, Jylo charges users for credits which are compatible with every model and deplete in accordance with the cost per token.
- Context length: Varies depending on the architecture of the model, tokenization strategy, computational resources and tuning.
The quality of language models is commonly measured using the Massive Multitask Language Understanding Benchmark (MMLU). This a test designed to assess the ability of large language models to comprehend natural language. The test is comprised of roughly 16000 multiple choice questions from 57 academic subjects including mathematics, medicine, law and philosophy. The test aims to simulate anthropological conditions by challenging models with various question types which test problem solving, memory and reasoning capabilities. Benchmark standards are commonly raised with the release of upgraded models, we predict that the this benchmark test will be outdated in the coming years as models are already close to achieving the perfect score but for now Jylo uses this metric to evaluate the quality of large language models.
Knowledge of the various LLMs and their features is essential for product creators to remain cost effective and produce the best results for any given task. AI users don’t require a deep understanding of the models but some knowledge of the models behavior will help users to maximise the value they extract from using AI.
Add comment
Please sign in to leave a comment.