Word frequency data


COCA+ 100k word forms list (compare to COCA 60k lemmas list)

The 100,000 word list is the largest, carefully-corrected, frequency-based word list of English available anywhere. Take a look at 5,000 randomly-selected words from the list (every twentieth word, 1 to 100,000) to check the accuracy of the list. We believe that no other word list comes close is terms of size and accuracy.

For each of the 100,000 words, the spreadsheet shows more than fifty pieces of frequency data (see a sample: xlsx, xls), including the frequency in COCA, the BNC, SOAP, and COHA, as well as portions of each corpus (e.g. spoken, fiction, magazines, newspapers, and academic). There are also links in the spreadsheet that allow you to -- with one click -- see each of the 100,000 words in the online corpora.

And because the data is in an Excel spreadsheet, you can sort by, limit by, and compare any of these 50+ columns. For example, you could select only nouns or verbs or words with certain substrings (e.g. certain prefixes or suffixes). You could also sort by the frequency in BNC Academic, or COCA Magazines, or the ratio of COCA and the BNC (e.g. finding "American" or "British" words), or the ratio of COCA and COHA (e.g. "new" words in COCA). See some samples of the different kinds of searches that you can do.

[ Purchase data ]