|
Overview
Using the data
100,000 word list
Samples
Web,
text,
Excel
Compare
Other,
60k list
FAQ / questions
5,000-60,000 lemma lists
Samples / formats
Compare
Purchase data
Free list (5,000)
Spanish data
Portuguese data
Related sites
WordAndPhrase
Collocates
N-grams
corpus.byu.edu
Contact us
|
This site contains what we believe is the
most accurate
frequency data of English, and it comes in a number of different formats (see samples: 100,000 and 60,000 word lists).
For the 5,000-60,000 word lists, you can download a
simple word list, frequency by genre, or as an
eBook or a printed
frequency dictionary. For the 100,000 word list, you can see detailed frequency
information for many genres in several different corpora.
In addition to word frequency data, you can also download up to 155 million
n-grams, and 4.3 million
collocates.
Any frequency list is
only as good as the corpus (collection of texts) that it is based
on. The 5,000-60,000 word lists are based on the only large, genre-balanced,
up-to-date corpus of American English -- the 450 million word
Corpus of
Contemporary American English (COCA). The 100,000 word
list supplements this COCA data with detailed frequency data from
the 400 million word Corpus of
Historical American English, the
British National Corpus,
and the Corpus of American
Soap Operas (for very informal language).
|