Back to the notebook
Research · 14 min read

How many words do you really need? The vocabulary thresholds, by language

Fluency is not a number, but coverage is. A field guide to the vocabulary sizes that unlock conversation, reading and television in 12 languages — and a roadmap for getting there with Lemnly.

SW
Sebastian Walter
Founder, Lemnly
March 4, 2026
A wall of shelved books in a library, spines facing out
Research
How many words do you really need? The vocabulary thresholds, by language

"How many words do I need to be fluent?" is the wrong question, but it is also the question every learner asks, so let’s answer it honestly. Fluency is a constellation of skills, but coverage — the share of words you know in a given text — is measurable, and it tracks almost perfectly to whether you can read it without exhausting yourself.

The three thresholds

Decades of corpus research, especially by Paul Nation and Norbert Schmitt, converge on three useful targets:

  • 2,000 word families — handles ~80% of everyday spoken text. Survival conversation. You can order food and ask directions, but you’ll lose the thread of a podcast within minutes.
  • 5,000 word families — handles ~95% of typical written text. The threshold where reading becomes pleasurable, not painful. You can read a novel and only need the dictionary once or twice a page.
  • 8,000–9,000 word families — the 98% mark. The point at which you can read most novels and follow most TV without guessing. You stop being a learner and start being a reader.

What "word family" actually means

A word family is a lemma plus its inflections and close derivations. Run, runs, running, ran, runner is one word family. This is why the numbers feel small — they describe concepts, not surface forms. If you counted every surface form separately, the numbers would be 2–3× higher and just as misleading.

Lemnly counts in word families by default. When you see "3,420 cards" in your stats, that’s 3,420 distinct lemmas — not 12,000 verb conjugations.

Why the numbers differ by language

Languages don’t pack information the same way. Mandarin builds new concepts by combining a small set of characters, so the raw lemma count to hit 95% reading coverage is lower (~4,000), but the per-character workload is higher. German, with its compound words, gives you a lot of leverage from a smaller core — once you know Haus and Tür, Haustür is free. Russian, with rich inflection, demands more morphological awareness than raw lemma count suggests.

Approximate 95% reading thresholds

LanguageApprox. word families for 95% coverageNotes
English~5,000Latin/Germanic split adds difficulty
Spanish~5,000Regular morphology, transparent
French~5,200Many Latinate cognates for English speakers
Italian~5,300Very phonetic, fast progress
Portuguese~5,200Similar profile to Spanish
German~5,500Compounds count; once you have parts, you get wholes
Dutch~5,400Heavy compound usage like German
Polish~6,000Seven cases inflate apparent count
Russian~6,500Rich morphology, perfective/imperfective pairs
Turkish~6,000Agglutinative — lemmas are smaller chunks
Mandarin (lemmas)~4,000Plus ~3,000 characters to recognise
Japanese~5,500Plus ~2,000 kanji for fluent reading

The honest caveats

These numbers measure the vocabulary that exists in your head. They say nothing about retrieval speed, grammar, listening comprehension, or the courage to actually open your mouth in a café. A learner with 4,000 active words and a hundred hours of listening practice will outperform one with 6,000 cards and no listening, every time.

Also: the curves are not linear. Going from 2,000 to 5,000 is a massive jump in capability — you go from survival to reading. Going from 5,000 to 8,000 is a much smaller perceived jump per card learned, but it’s the difference between "I can read" and "I read for pleasure."

How to use Lemnly to hit each threshold

A Lemnly-shaped roadmap, by threshold:

  • 0 → 2,000 (survival). Start with a foundation deck — Lemnly’s public "Top 2k" decks per language are free and frequency-ordered. Add 20 new cards a day, review for 10 minutes. You’ll hit 2,000 in about 100 days.
  • 2,000 → 5,000 (pleasurable reading). Switch modes. Stop adding from frequency lists, start importing real text. One article a day via URL, one chapter of a graded reader a week. The cards now come from your interests, not from a list. Pace: about 10 new cards a day. Expect 9–12 months.
  • 5,000 → 8,000 (fluent reading). Read full novels, harder articles, professional content. The yield per import drops — you already know most words — but the cards you do get are the high-value rare ones. Pace: 5 new cards a day. Expect 12–18 months.
  • 8,000+. You don’t need a plan anymore. Read. Import the occasional article when a word catches you. Review for five minutes a day. You’re a reader.

What "active" vocabulary means in your stats

Lemnly distinguishes three states for any lemma:

  • Learning — in your deck, but not yet stable. Don’t count it.
  • Active — stable, in regular review, retention above 80%. This is the number to track.
  • Mature — recalled correctly for 30+ days running. Effectively yours.

When you see "your vocabulary: 4,820" in your Lemnly dashboard, that’s active + mature. It’s the number that maps to the coverage thresholds above.

How to use this

Pick the 5,000-family target. Track your active vocabulary in Lemnly. Add 10–15 cards a day from things you actually want to read. In a year you’ll be at ~5,000 — pleasant reading. Another year at half the pace puts you well past fluent reading.

A reasonable side-effect: you will also be able to watch the news in your target language, which is the moment most learners realise they actually got somewhere.

The article you’d normally skim?
Paste it in tonight.

Free while in beta. Add your first source in under a minute and see the words you actually need.

Try the beta No credit card · English ↔ German & Spanish today