| Introduction This reference card helps you choose the right AI model in Jylo for the task in front of you. It explains how to think about the choice, shows where each model sits on a single scale from executing instructions to reasoning, and provides a side-by-side table to help you decide. |
Choosing a model
Choosing the right AI model is less about which one is “best” and more about matching the kind of thinking the model does to the kind of thinking your task needs. Picture a single scale.
At one end is executing instructions: following clear directions on well-defined inputs. Extracting names from documents, classifying support tickets or summarising a meeting transcript. You know exactly what you want, and the model fetches it.
At the other end is reasoning what’s necessary: working out what the right questions are, not just the answers. Reviewing a lease for risks you have not named, or assessing a proposal for problems that surface only in implementation. You know the goal but not the shape of the answer. Reasoning of this kind rests on deep knowledge of the subject—the models at this end of the scale draw on a stronger grasp of the field, which is what lets them spot the issues you did not think to ask about.
Most tasks fall somewhere between the two ends. The further right your task sits, the more judgment it asks of the model—and the more it costs to run.
| Tip Default to Claude Sonnet 4.6. Sonnet sits in the middle of the scale and handles around nine in ten tasks well, at a moderate cost. Move right to Opus 4.6 or GPT-5.5 only for the harder tenth where Sonnet falls short. Move left to Haiku 4.5 or GPT-4.1 for high-volume or time-sensitive execution work. |
The scale
■ Recommended default (Sonnet 4.6). Claude models sit above the scale; GPT models below.
The scale is a continuum, not a set of boxes. Every model reasons to some degree; what changes as you move right is how far the model reasons beyond what you have spelt out, and how deeply it draws on knowledge of the subject. The pairs cluster naturally—Haiku 4.5 and GPT-4.1 toward execution, Sonnet 4.6 and GPT-5.4 around the middle, Opus 4.6 and GPT-5.5 toward reasoning—but each blends into the next rather than stopping at a hard line.
Which model for which task
Toward the execution end — Haiku 4.5 and GPT-4.1
Reach for these when the task is well-defined and the model’s job is to fetch, transform or classify rather than to think. They are fast, inexpensive and capable enough for the majority of high-volume work.
- Extract the parties, dates and key amounts from each of these contracts into a table.
- Classify support tickets as “billing”, “technical”, “account” or “other”.
- Read this email thread and pull out every action item, with the owner if mentioned.
- Summarise this hour-long meeting transcript into five bullet points and a list of decisions.
Around the middle — Sonnet 4.6 and GPT-5.4 (the default zone)
Reach for these when you know what you are looking for and want the model to find and reason about it within those bounds. This is the right default for most professional work.
- Review this employment contract and flag anything that conflicts with our standard handbook terms.
- Compare these three vendor proposals against the requirements I have listed and tell me which best fits, and where each has gaps.
- If there is an overage clause, report on what other clauses and dependencies should be checked throughout the document.
Toward the reasoning end — Opus 4.6 and GPT-5.5
Reach for these when you know the goal but not the shape of the answer. They distinguish themselves by identifying issues you did not think to ask about, weighing trade-offs the brief did not name and reaching judgments under uncertainty. They are noticeably more expensive and should be used selectively.
- Review this lease on the basis that our client is taking an assignment and identify any provisions that could expose them to unexpected liability or restrict their intended use.
- Read this M&A disclosure schedule and tell me what concerns a buyer should raise at the next negotiation session.
- Assess this engineering change proposal for risks that might not surface until implementation, and explain why.
- Read three years of board minutes and identify recurring themes that suggest unresolved governance issues.
How to choose
Start by deciding where your task sits on the scale, then refine with the rules below.
Start with Claude Sonnet 4.6. It handles around nine in ten tasks well, at a moderate cost. The rules below assume Sonnet is your starting point.
Move right (toward reasoning) when:
- Sonnet’s answer is close but not right, and small prompt changes do not fix it.
- The task requires the model to spot risks or implications you have not named.
- The output is high-stakes—going to a client, a board or a regulator.
- The work involves reasoning across many sources, where missing a connection is costly.
Choose between Opus 4.6 and GPT-5.5 based on cost tolerance and your existing platform preferences. Both sit at the reasoning end.
Move left (toward execution) when:
- The task is well-defined and the model is fetching, classifying or transforming rather than reasoning.
- The work runs at high volume, where unit cost matters.
- Latency is the main constraint—real-time chat or an interactive interface.
Haiku 4.5 is the faster and cheaper of the two; GPT-4.1 is a well-established alternative.
Model reference
| Model | Provider | Best suited to | Cost | Context | Worth knowing |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | Anthropic | High-volume execution | £ | 200K tokens | Fastest and cheapest in this set. The right default for high-volume execution. |
| GPT-4.1 | OpenAI | Established OpenAI execution | ££ | 1M tokens | Older general-purpose model. Reasonable when an established OpenAI option is preferred. |
| Claude Sonnet 4.6 | Anthropic | Most everyday work (default) | £££ | 1M tokens | The recommended default. Strong reasoning at a moderate cost; large context capacity. |
| GPT-5.4 | OpenAI | Multi-step, action-taking work | £££ | 1M tokens | Distinguished by action-taking: operates spreadsheets, builds decks and runs multi-step jobs. |
| Claude Opus 4.6 | Anthropic | Deep reasoning beyond the brief | £££££ | 1M tokens | Reach for Opus when Sonnet falls short. Strong on autonomous reasoning beyond the brief. |
| GPT-5.5 | OpenAI | The hardest reasoning tasks | £££££ | 1M tokens | Strongest on the most open-ended reasoning. |
Cost tiers: £ lowest · ££ low · £££ moderate · ££££ high · £££££ highest. Relative running cost per typical interaction.
A few things worth knowing
Cost depends on how you use the model. The cost rating reflects a typical mixed workload. Long documents, code generation and heavy reasoning all push cost higher. Cost differences compound at scale.
Context length is the input ceiling, not memory. Context length is the amount of text the model can consider in a single request, measured in tokens (roughly three-quarters of a word—1,000 tokens is about 750 words, or two pages of A4). It is not memory: models do not retain information between separate conversations.
Test before you commit for high-stakes work. No card can predict how a model will perform on your specific material. For important workflows, try your shortlisted model on a representative sample before relying on it.
Placements will shift as new models arrive. The positions and cost ratings compare these six models against one another, not against the wider market. New releases will change the picture; this card is reviewed periodically.
| Remember There is no single “best” model. Match the model to the kind of thinking your task needs. Default to Sonnet 4.6, move right when the work calls for judgment beyond the brief, and move left when volume or speed matters more than depth. |
Add comment
Please sign in to leave a comment.