Word frequency data

from the 14 billion word iWeb corpus

intro samples related get data

You can purchase a frequency listing of the top 60,000 words (lemmas) in the 14 billion word iWeb corpus. No other frequency list of English is based on a corpus that is even close to iWeb in size. In addition, there are some important differences between the iWeb list and the COCA-based list.

When you purchase the data, you receive two different files: a lemmatized list (below left) and a "word forms" list (below right). The word forms list has approximately 97,000 word forms.

An explanation of the columns for both types of data is found in the downloadable sample (which contains every tenth word in the 60,000 word lemmatized list).

Lemmatized Word forms
rank lemma

PoS

lemFreq range range10 caps
11610 hamper v 40995 0.04 0.01 0.01
11611 toggle v 40981 0.04 0.01 0.08
11612 bilingual j 40979 0.03 0.01 0.2
11613 scuba n 40974 0.02 0.01 0.15
11614 graphite n 40961 0.02 0.01 0.26
11615 lecture v 40957 0.04 0.01 0.03
11616 seismic j 40932 0.02 0.01 0.14
11617 farm v 40921 0.03 0.01 0.05
11618 tavern n 40918 0.03 0.01 0.41
11619 orbital j 40916 0.02 0.01 0.14
11620 frenzy n 40897 0.16 0.01 0.18
lemRank lemma PoS wordForm wordFreq
11610 hamper v hampered 21110
11610 hamper v hamper 14194
11610 hamper v hampering 5198
11610 hamper v hampers 493
11611 toggle v toggle 30472
11611 toggle v toggled 4302
11611 toggle v toggles 3355
11611 toggle v toggling 2852
11612 bilingual j bilingual 39468
11612 bilingual j bi-lingual 1511
11613 scuba n scuba 40974
11614 graphite n graphite 40961
11615 lecture v lectured 13409
11615 lecture v lecture 11695
11615 lecture v lecturing 10749
11615 lecture v lectures 5104
11616 seismic j seismic 40932
11617 farm v farmed 20092
11617 farm v farming 11587
11617 farm v farm 7772
11617 farm v farms 1470
11618 tavern n tavern 34616
11618 tavern n taverns 6302
11619 orbital j orbital 40916
11620 frenzy n frenzy 40025
11620 frenzy n frenzies 872