iWeb (released
in 2018) contains about 14 billion words of text from an extremely broad range
of websites. iWeb is one of only three corpora from the web that are 10 billion
words in size or larger, and it is the only such corpus with carefully-corrected
wordlists. iWeb is about 25 times as large as COCA (the other main source
for the word frequency data), and there are some
important differences between the iWeb list and the COCA-based list. When
you purchase the data, you receive two
different files: a lemmatized list (below left) and a "word forms" list (below right). The
word forms list has approximately
97,000 word forms.
An explanation of the columns for both types of data is found in the
downloadable sample
(which contains every tenth word in the 60,000 word lemmatized list).
Lemmatized |
Word forms |
rank |
lemma |
PoS |
lemFreq |
range |
range10 |
caps |
11610 |
hamper |
v |
40995 |
0.04 |
0.01 |
0.01 |
11611 |
toggle |
v |
40981 |
0.04 |
0.01 |
0.08 |
11612 |
bilingual |
j |
40979 |
0.03 |
0.01 |
0.2 |
11613 |
scuba |
n |
40974 |
0.02 |
0.01 |
0.15 |
11614 |
graphite |
n |
40961 |
0.02 |
0.01 |
0.26 |
11615 |
lecture |
v |
40957 |
0.04 |
0.01 |
0.03 |
11616 |
seismic |
j |
40932 |
0.02 |
0.01 |
0.14 |
11617 |
farm |
v |
40921 |
0.03 |
0.01 |
0.05 |
11618 |
tavern |
n |
40918 |
0.03 |
0.01 |
0.41 |
11619 |
orbital |
j |
40916 |
0.02 |
0.01 |
0.14 |
11620 |
frenzy |
n |
40897 |
0.16 |
0.01 |
0.18 |
|
lemRank |
lemma |
PoS |
wordForm |
wordFreq |
11610 |
hamper |
v |
hampered |
21110 |
11610 |
hamper |
v |
hamper |
14194 |
11610 |
hamper |
v |
hampering |
5198 |
11610 |
hamper |
v |
hampers |
493 |
11611 |
toggle |
v |
toggle |
30472 |
11611 |
toggle |
v |
toggled |
4302 |
11611 |
toggle |
v |
toggles |
3355 |
11611 |
toggle |
v |
toggling |
2852 |
11612 |
bilingual |
j |
bilingual |
39468 |
11612 |
bilingual |
j |
bi-lingual |
1511 |
11613 |
scuba |
n |
scuba |
40974 |
11614 |
graphite |
n |
graphite |
40961 |
11615 |
lecture |
v |
lectured |
13409 |
11615 |
lecture |
v |
lecture |
11695 |
11615 |
lecture |
v |
lecturing |
10749 |
11615 |
lecture |
v |
lectures |
5104 |
11616 |
seismic |
j |
seismic |
40932 |
11617 |
farm |
v |
farmed |
20092 |
11617 |
farm |
v |
farming |
11587 |
11617 |
farm |
v |
farm |
7772 |
11617 |
farm |
v |
farms |
1470 |
11618 |
tavern |
n |
tavern |
34616 |
11618 |
tavern |
n |
taverns |
6302 |
11619 |
orbital |
j |
orbital |
40916 |
11620 |
frenzy |
n |
frenzy |
40025 |
11620 |
frenzy |
n |
frenzies |
872 |
|
|