|
Overview
Using the data
100,000 word list
Samples
Web,
text,
Excel
Compare
Other,
60k list
FAQ / questions
5,000-60,000 lemma lists
Samples / formats
Compare
Purchase data
Free list (5,000)
Spanish data
Portuguese data
Related sites
WordAndPhrase
Collocates
N-grams
corpus.byu.edu
Contact us
|
The 100,000 word list is the largest, carefully-corrected,
frequency-based word list of English available anywhere. Take a
look at 5,000
randomly-selected words from the list (every twentieth word, 1 to
100,000) to check the accuracy of the list. We believe that no other word list
comes close is terms of size and accuracy.
For each of the 100,000 words, the
spreadsheet shows more than fifty pieces of frequency data (see a sample:
xlsx,
xls), including the
frequency in
COCA, the BNC,
SOAP, and
COHA, as well as portions of each
corpus (e.g. spoken, fiction, magazines, newspapers, and academic). There are
also links in the spreadsheet that allow you to -- with one click -- see each of
the 100,000 words in the online corpora.
And because the data is in an Excel spreadsheet,
you can sort by, limit by, and compare any of these 50+ columns. For example,
you could select only nouns or verbs or words with certain substrings (e.g.
certain prefixes or suffixes). You could also sort by the frequency in BNC
Academic, or COCA Magazines, or the ratio of COCA and the BNC (e.g. finding
"American" or "British" words), or the ratio of COCA and COHA (e.g. "new" words
in COCA). See some samples of the different kinds
of searches that you can do.
|