COCA+ 100k word forms list (compare
to COCA 60k lemmas list)
The 100,000 word list is the largest, carefully-corrected,
frequency-based word list of English available anywhere. Take a
look at 5,000
randomly-selected words from the list (every twentieth word, 1 to
100,000) to check the accuracy of the list. We believe that no other word list
comes close is terms of size and accuracy.
For each of the 100,000 words, the
spreadsheet shows more than fifty pieces of frequency data (see a sample:
xls), including the
COCA, the BNC,
COHA, as well as portions of each
corpus (e.g. spoken, fiction, magazines, newspapers, and academic). There are
also links in the spreadsheet that allow you to -- with one click -- see each of
the 100,000 words in the online corpora.
And because the data is in an Excel spreadsheet,
you can sort by, limit by, and compare any of these 50+ columns. For example,
you could select only nouns or verbs or words with certain substrings (e.g.
certain prefixes or suffixes). You could also sort by the frequency in BNC
Academic, or COCA Magazines, or the ratio of COCA and the BNC (e.g. finding
"American" or "British" words), or the ratio of COCA and COHA (e.g. "new" words
in COCA). See some samples of the different kinds
of searches that you can do.