Word frequency data


This site contains what is probably the most accurate word frequency data for English. The first set of wordlists are based on the the 14 billion word iWeb corpus -- one of only three corpora from the Web that are 10 billion words in size or larger (and the only such corpus with carefully-corrected wordlists). The second set of wordlists are based on the Corpus of Contemporary American English (COCA) (now 560 million words in size), which is the largest genre-balanced corpus of English. In addition to word frequency data, you can also download n-grams and collocates from both iWeb and COCA.

Corpus iWeb (14 billion words from the Web) More COCA (the largest genre-balanced corpus of English). More
Best for: Web and tech language (compare to COCA) Wide variety of genres: spoken, fiction, popular magazines, newspaper, academic
Samples: Sample: top 60,000 lemmas and ~100,000 word forms (both sets included for the same price) Top 20,000 or 60,000 lemmas: simple word list, frequency by genre, or as an eBook. Top 100,000 word forms. Also contains information on COCA genres, and frequency in the BNC (British), SOAP (informal) and COHA (historical)
 
rank   lemma / word PoS freq range range10
7371   brew v 94904 0.06 0.01
17331   useable j 17790 0.02 0.00
27381   uppercase n 5959 0.02 0.00
37281   half-naked j 2459 0.00 0.00
47381   bellhop n 1106 0.00 0.00
57351   tetherball n 425 0.00 0.00
rank   lemma / word PoS freq dispersion
7309   attic n 2711 0.91
17311   tearful j 542 0.93
27303   tailgate v 198 0.85
37310   hydraulically r 78 0.83
47309   unsparing j 35 0.83
57309   embryogenesis n 22 0.66