How many words do you really need? The vocabulary thresholds, by language
Fluency is not a number, but coverage is. A field guide to the vocabulary sizes that unlock conversation, reading and television in 12 languages — and a roadmap for getting there with Lemnly.
"How many words do I need to be fluent?" is the wrong question, but it is also the question every learner asks, so let’s answer it honestly. Fluency is a constellation of skills, but coverage — the share of words you know in a given text — is measurable, and it tracks almost perfectly to whether you can read it without exhausting yourself.
The three thresholds
Decades of corpus research, especially by Paul Nation and Norbert Schmitt, converge on three useful targets:
- 2,000 word families — handles ~80% of everyday spoken text. Survival conversation. You can order food and ask directions, but you’ll lose the thread of a podcast within minutes.
- 5,000 word families — handles ~95% of typical written text. The threshold where reading becomes pleasurable, not painful. You can read a novel and only need the dictionary once or twice a page.
- 8,000–9,000 word families — the 98% mark. The point at which you can read most novels and follow most TV without guessing. You stop being a learner and start being a reader.
What "word family" actually means
A word family is a lemma plus its inflections and close derivations. Run, runs, running, ran, runner is one word family. This is why the numbers feel small — they describe concepts, not surface forms. If you counted every surface form separately, the numbers would be 2–3× higher and just as misleading.
Lemnly counts in word families by default. When you see "3,420 cards" in your stats, that’s 3,420 distinct lemmas — not 12,000 verb conjugations.
Why the numbers differ by language
Languages don’t pack information the same way. Mandarin builds new concepts by combining a small set of characters, so the raw lemma count to hit 95% reading coverage is lower (~4,000), but the per-character workload is higher. German, with its compound words, gives you a lot of leverage from a smaller core — once you know Haus and Tür, Haustür is free. Russian, with rich inflection, demands more morphological awareness than raw lemma count suggests.
Approximate 95% reading thresholds
| Language | Approx. word families for 95% coverage | Notes |
|---|---|---|
| English | ~5,000 | Latin/Germanic split adds difficulty |
| Spanish | ~5,000 | Regular morphology, transparent |
| French | ~5,200 | Many Latinate cognates for English speakers |
| Italian | ~5,300 | Very phonetic, fast progress |
| Portuguese | ~5,200 | Similar profile to Spanish |
| German | ~5,500 | Compounds count; once you have parts, you get wholes |
| Dutch | ~5,400 | Heavy compound usage like German |
| Polish | ~6,000 | Seven cases inflate apparent count |
| Russian | ~6,500 | Rich morphology, perfective/imperfective pairs |
| Turkish | ~6,000 | Agglutinative — lemmas are smaller chunks |
| Mandarin (lemmas) | ~4,000 | Plus ~3,000 characters to recognise |
| Japanese | ~5,500 | Plus ~2,000 kanji for fluent reading |
The honest caveats
These numbers measure the vocabulary that exists in your head. They say nothing about retrieval speed, grammar, listening comprehension, or the courage to actually open your mouth in a café. A learner with 4,000 active words and a hundred hours of listening practice will outperform one with 6,000 cards and no listening, every time.
Also: the curves are not linear. Going from 2,000 to 5,000 is a massive jump in capability — you go from survival to reading. Going from 5,000 to 8,000 is a much smaller perceived jump per card learned, but it’s the difference between "I can read" and "I read for pleasure."
How to use Lemnly to hit each threshold
A Lemnly-shaped roadmap, by threshold:
- 0 → 2,000 (survival). Start with a foundation deck — Lemnly’s public "Top 2k" decks per language are free and frequency-ordered. Add 20 new cards a day, review for 10 minutes. You’ll hit 2,000 in about 100 days.
- 2,000 → 5,000 (pleasurable reading). Switch modes. Stop adding from frequency lists, start importing real text. One article a day via URL, one chapter of a graded reader a week. The cards now come from your interests, not from a list. Pace: about 10 new cards a day. Expect 9–12 months.
- 5,000 → 8,000 (fluent reading). Read full novels, harder articles, professional content. The yield per import drops — you already know most words — but the cards you do get are the high-value rare ones. Pace: 5 new cards a day. Expect 12–18 months.
- 8,000+. You don’t need a plan anymore. Read. Import the occasional article when a word catches you. Review for five minutes a day. You’re a reader.
What "active" vocabulary means in your stats
Lemnly distinguishes three states for any lemma:
- Learning — in your deck, but not yet stable. Don’t count it.
- Active — stable, in regular review, retention above 80%. This is the number to track.
- Mature — recalled correctly for 30+ days running. Effectively yours.
When you see "your vocabulary: 4,820" in your Lemnly dashboard, that’s active + mature. It’s the number that maps to the coverage thresholds above.
How to use this
Pick the 5,000-family target. Track your active vocabulary in Lemnly. Add 10–15 cards a day from things you actually want to read. In a year you’ll be at ~5,000 — pleasant reading. Another year at half the pace puts you well past fluent reading.
A reasonable side-effect: you will also be able to watch the news in your target language, which is the moment most learners realise they actually got somewhere.