This site contains what we believe is the
frequency data of English, and it comes in a number of different formats (see samples: 100,000 and 60,000 word lists,
and a comparison of the two lists).
For the 5,000-60,000 word lists, you can download a
simple word list, frequency by genre, or as an
eBook or a printed
frequency dictionary. For the 100,000 word list, you can see detailed frequency
information for many genres in several different corpora.
In addition to word frequency data, you can also download up to 155 million
n-grams, and 4.3 million
Any frequency list is
only as good as the corpus (collection of texts) that it is based
on. The 5,000-60,000 word lists are based on the only large, genre-balanced,
up-to-date corpus of American English -- the 450 million word
Contemporary American English (COCA). The 100,000 word
list supplements this COCA data with detailed frequency data from
the 400 million word Corpus of
Historical American English, the
British National Corpus,
and the Corpus of American
Soap Operas (for very informal language).