Traditionally AI software providers have built on one underlying AI model, typically ChatGPT. There are several drawbacks to this methodology. To begin with, the underlying model must be capable of performing the most computationally challenging features of the software, meaning developers must select a large and expensive AI model, even if these features are infrequently used. This architecture is reflected in subscription prices, with users paying for AI capabilities they don't need or know of.
Instead Jylo allows users to remain cost effective by giving them the ability to choose what model is best suited to their use case, charging on a pay for what you use basis. The benefits of this are twofold as certain models outperform others at given tasks. This requires that product creators have an understanding of large language models available on the Jylo platform. Models vary on quality, speed, input/output cost and context length.
- Quality: A combination of the quality and quantity of the models training data, computational resources, architecture and tuning. Larger models can capture more complex linguistic patterns due to the scope of the training data.
- Speed: Determined by the size of the model, smaller models are able to recall information faster as navigating the training data requires less computational power and time.
- Input/output cost: Cost is measured by token usage. Input text is broken into tokens which are processed through a series of algorithms to produce output tokens. These are converted to text before being displayed to the user who is charged based on the quantity of tokens they provided to the model, this includes all prompts and uploaded documentation. AI models vary in token price, Jylo charges users for credits which are compatible with every model and deplete in accordance with the cost per token.
- Context length: Varies depending on the architecture of the model, tokenization strategy, computational resources and tuning.
To measure the overall quality of large language models (LLMs), we use various tests to evaluate their skills. One such test is the Massive Multitask Language Understanding (MMLU) Benchmark, which challenges models with questions from different subjects, like math and philosophy, assessing their general knowledge. As technology rapidly advances, methods like MMLU are quickly becoming outdated, with models getting closer to perfect scores. Despite this rapid development, these tests help ensure models remain effective, reliable, and safe for everyday use.
Knowledge of the various LLMs and their features is essential for product creators to remain cost effective and produce the best results for any given task. AI users don’t require a deep understanding of the models but some knowledge of the models behavior will help users to maximise the value they extract from using AI.
Add comment
Please sign in to leave a comment.