High Frequency Word Lists

high frequency word lists

This is a part of my Language Learning series.

What is a High Frequency Word List?

The basic idea of a high frequency word list is based on the fact that the majority of words used in a language are limited to a small subset of the total words that exist in the language.

The specific numbers vary depending on the source that is being used (spoken language, newspapers, books, textbooks, radio, etc.) but for the most part it follows a general 80/20 Pareto Principal where if you learn around 1,000 words of a language you can understand around 65 – 80% of the words used in that language.

I used to get really excited by this idea because it meant that if I just learned those 1,000 words I would be able to understand most of the language. Talk about a shortcut!

Plus, if you use mnemonic devices or something like SRS (Spaced Repetition System) that helps you embed items into your memory faster I figured that memorizing those 1,000 words would only take a matter of weeks.

Why Most High Frequency Word Lists Don’t Work

This seems to make total sense, but unfortunately I’ve found that memorizing lists of words through an SRS application (like Anki) is not very effective.

For one primary reason: context

The words you are studying aren’t based on context or a known situation. They’re an unrelated sequence of words that don’t have any relationship to the words before or after it.

It would be like trying to learn to count to ten, but doing it out of order. The whole reason for learning numbers in sequence is that each number represents a quantitative piece of contextual information which is related to the numbers before and after it.

In other words, the numbers tell a story. (Albeit, not a very interesting one.)

Evaluating this approach

In my main “Language Learning” post I said that I have two main things that I use to evaluate the effectiveness of these methods:

  1. Is this a natural way for human beings to learn a language? In other words, is this how we learned our first language(s)?
  2. Have I found this method to be effective in my own studies? What is my own experience with this method?

First, memorizing lists of high frequency words is definitely not a natural way for human beings to learn a language. That isn’t how I learned my first language (technically that was Japanese, but in reality it is English). I learned vocabulary in the context of how those words were used. The words made sense because they were part of the “story” of that situation.

Second, as I stated earlier, I haven’t found this method to be particularly effective in learning how to be proficient in a language. And, I think that is because I hadn’t yet figured out a better way to use high frequency word lists.

A better way to use High Frequency Word Lists

I certainly think there is a place for using SRS and frequency word lists. They can be a nice and effective supplement to learning a language.

With one caveat.

And the caveat is related to the main issue with frequency word lists: They are only showing up in a high frequency for other people (or sources). They have nothing to do with your experience in the language.

Instead, why not create your own high frequency list of words?

As you are studying the language, write down those words that you keep hearing or reading through your exposure. Then, add those words to your SRS system.

You’ll find it considerably easier to remember those words because you’ll remember the context for how they were used and in what situation they showed up.

I know … “But, Mark, that is more work! Isn’t it easier to just use someone else’s list?

Easier? Yes. More effective? No.

Believe me, I hear you. I’d love to take the lazy approach too. But as I keep being reminded by “life”, the lazy approach and attractive shortcuts often end up taking more time than just putting in the work and focusing on the task at hand would have in the first place.

SRS and high frequency word lists are great. But you have to put in the work to make them relevant to you, and not some arbitrary source of vocabulary.