π§© Tokens Explained
Simple Explanationβ
A token is the basic unit that an LLM reads and writes. Tokens aren't exactly words β they're pieces of text that the model breaks language into before processing it.
Think of tokens like LEGO bricks. Just as a LEGO set breaks a castle into individual bricks, an LLM breaks your text into individual tokens. Some words are one token, some are split into multiple tokens, and sometimes multiple words can even be a single token.
Why This Mattersβ
Tokens directly affect your experience with AI in three critical ways:
- π° Cost β You pay per token with most AI APIs (both input AND output tokens)
- π Limits β Every model has a maximum token limit (context window)
- β‘ Speed β More tokens = slower response times
- π― Quality β Understanding tokens helps you write more efficient prompts
If you're building products with AI or using it professionally, tokens are where the rubber meets the road on budgets and performance.
Understanding Tokens in Detailβ
How Words Split Into Tokensβ
LLMs use a process called tokenization to break text into tokens. Here are the rules of thumb:
| Text | Tokens | Count |
|---|---|---|
hello | hello | 1 token |
Hello! | Hello + ! | 2 tokens |
chatbot | chat + bot | 2 tokens |
unbelievable | un + believ + able | 3 tokens |
I'm | I + 'm | 2 tokens |
GPT-4 | G + PT + - + 4 | 4 tokens |
the (with space) | the | 1 token (space included!) |
General Rules of Thumbβ
For English text:
- 1 token β 4 characters (on average)
- 1 token β 0.75 words (on average)
- 100 tokens β 75 words
- 1,000 tokens β 750 words (about 1.5 pages of text)
Some quick conversions:
A tweet (280 chars) β 70 tokens
A short email β 200 tokens
A one-page document β 500 tokens
A 5-page report β 2,500 tokens
A full novel β 100,000 tokens
Why Tokenization Works This Wayβ
LLMs use algorithms like Byte Pair Encoding (BPE) to decide how to split text:
- Start with individual characters
- Find the most common pairs of characters
- Merge those pairs into new tokens
- Repeat until you reach the desired vocabulary size
Common words like "the", "and", "is" become single tokens because they appear so frequently. Rare or complex words get split into smaller pieces.
Tokens in Different Languagesβ
Tokenization isn't equally efficient across languages:
| Language | "Hello, how are you?" | Approximate Tokens |
|---|---|---|
| English | Hello, how are you? | ~6 tokens |
| Spanish | Hola, ΒΏcΓ³mo estΓ‘s? | ~8 tokens |
| Chinese | δ½ ε₯½οΌδ½ ζδΉζ ·οΌ | ~11 tokens |
| Arabic | Ω Ψ±ΨΨ¨Ψ§ ΩΩΩ ΨΨ§ΩΩΨ | ~13 tokens |
This means non-English text is more expensive and uses more of the context window. It's an important consideration for multilingual applications.
Understanding Token Costsβ
Most AI APIs charge per token. Here's a rough idea (prices vary by provider and model):
GPT-4o:
Input: ~$2.50 per 1 million tokens
Output: ~$10.00 per 1 million tokens
Claude Sonnet:
Input: ~$3.00 per 1 million tokens
Output: ~$15.00 per 1 million tokens
Notice that output tokens cost more than input tokens. This means verbose responses cost more. A prompt that generates a 2,000-word essay costs significantly more than one that generates a 200-word summary.
Prompt Exampleβ
Understanding tokens helps you write more efficient prompts β saying more with fewer tokens.
β Bad Exampleβ
I would really, really appreciate it if you could perhaps take some
time to write a very detailed and comprehensive and thorough explanation
about what the concept of machine learning is all about and maybe include
some examples if you don't mind and also if it's not too much trouble
could you make it easy to understand for someone who is a beginner
This prompt is 67 words (~89 tokens) but most of those tokens are wasted on filler words and unnecessary politeness. You're paying for tokens that add zero value.
β Improved Exampleβ
Explain machine learning for a complete beginner.
Include 3 real-world examples.
Keep it under 200 words.
This prompt is 18 words (~24 tokens) β about 73% fewer tokens for the same result. It's clearer, cheaper, and will actually produce a better response.
Try It Yourselfβ
π§ͺ Try It Yourself
Edit the prompt and click Run to see the AI response.
Token Efficiency Challenge:
Take this verbose prompt and rewrite it to use fewer than 30 tokens while keeping the same intent:
Could you please be so kind as to generate a comprehensive list of
approximately ten creative and unique ideas for blog posts that would
be related to the general topic of artificial intelligence and its
various applications in the modern business world today?
Hint: Strip out all filler words and get straight to the point. What are the essential pieces of information the model needs?
Bonus: Try to estimate the token count of both your original and rewritten versions using the "1 token β 0.75 words" rule.
Real-World Scenarioβ
Scenario: You're building a customer service chatbot and need to manage costs. Each customer interaction averages 2,000 tokens (input + output).
I'm building a customer support chatbot that handles ~5,000 conversations
per day. Each conversation averages about 2,000 tokens total.
Help me calculate:
1. Monthly token usage
2. Monthly cost at $5 per 1M input tokens and $15 per 1M output tokens
(assume 40% input, 60% output)
3. Three strategies to reduce token usage without hurting quality
4. How to set up token budgets per conversation
Present the calculations in a clear table format.
Understanding tokens transforms AI from "magic that costs money" to a measurable, optimizable resource.
"How does tokenization work in LLMs, and why does it matter for prompt engineering?"
Strong Answer: Tokenization is the process of breaking text into smaller units (tokens) that an LLM can process. Most modern LLMs use subword tokenization methods like Byte Pair Encoding (BPE), which builds a vocabulary by iteratively merging the most frequent character pairs. Common words become single tokens while rare words are split into subword pieces. This matters for prompt engineering in several ways: first, tokens determine cost since API pricing is per-token; second, every model has a maximum context window measured in tokens; third, token efficiency varies by language, with English being more efficient than many other languages. A skilled prompt engineer writes concise prompts that maximize information per token, explicitly controls output length to manage costs, and understands that both input and output count toward the context window limit.
- Tokens are the basic units LLMs use to read and write text
- One token is roughly 4 characters or 0.75 words in English
- Words can be 1 token or several tokens depending on complexity
- Tokens matter for cost (you pay per token), limits (context windows), and speed
- Output tokens cost more than input tokens in most APIs
- Non-English languages use more tokens for the same content
- Writing concise, efficient prompts saves money and gets better results
- Always think about the token budget when designing AI-powered applications