Word frequency data


Note: this data is based on corpora that were created solely by Mark Davies, Professor of Linguistics at Brigham Young University. As the result of an agreement between BYU and Mark Davies, all transactions regarding payments and licenses for this data are made solely with Mark Davies, rather than with BYU.

The COCA 100,000 word list comes in two formats (both included in the purchase price): text and Excel. The 20,000-60,000 lemma lists can be purchased in several different formats:

 1.  Text files. Click on the appropriate link in the blue sections (e.g. 90).  With these versions, you can view, search, print, export, and re-use data.
 2.  eBook. With this version, you can view and search all of the data in the eBook, but you cannot print or export from the eBook.
 
Licensing: A=academic, C=commercial Click on heading for samples
# words format Wordlist Wordlist +
genre frequency
eBook
5,000 lemmas Free (A) $90 (C) $45 (A) $90 (C) $19.95
20,000 lemmas $60 (A) $120 (C) $80 (A) $160 (C) $39.95
60,000 lemmas $90 (A) $180 (C) $125 (A) $250 (C) -----
100,000 words $125 (A)     $250 (C)
(price includes both text and Excel files,
with 200,000 links to COCA queries)
See comparison
of 60k and 100k lists

Questions about what size of list to purchase? (5,000, 20,000, or 60,000) Take a look at this list to see what words appear at the different levels. There's no need to buy a larger list if a smaller one is adequate for your needs.