Word frequency data

Corpus of Contemporary American English


 Purchase data 

Overview
Using the data
Compare 100k/60k

100,000 word list
  Samples
  Compare
  FAQ / questions

5,000-60,000 lemma lists
   Samples / formats
   Compare
   Free list (5,000)

Spanish data
Portuguese data

Related sites
  Full-text data 
  Collocates
  N-grams
  WordAndPhrase
  Academic vocabulary
  corpus.byu.edu

Contact us


This site contains what we believe is the most accurate frequency data of English, and it comes in a number of different formats (see samples: 100,000 and 60,000 word lists, and a comparison of the two lists).

For the 5,000-60,000 word lists, you can download a simple word list, frequency by genre, or as an eBook or a printed frequency dictionary. For the 100,000 word list, you can see detailed frequency information for many genres in several different corpora. In addition to word frequency data, you can also download up to 155 million n-grams, and 4.3 million collocates.

Any frequency list is only as good as the corpus (collection of texts) that it is based on. The 5,000-60,000 word lists are based on the only large, genre-balanced, up-to-date corpus of American English -- the 450 million word Corpus of Contemporary American English (COCA). The 100,000 word list supplements this COCA data with detailed frequency data from the 400 million word Corpus of Historical American English, the British National Corpus, and the Corpus of American Soap Operas (for very informal language).