Word frequency lists and dictionary
from the Corpus of Contemporary American English

home uses compare samples free lists n-grams non-english purchase


There are a number of different formats available, as shown below. Click here to order.
 

1.  Wordlist
  • Lemma, rank, part of speech, dispersion score

  • Text or Excel file: can be printed / copied

  • List sizes: 5,000, 10,000, 20,000, 40,000, 60,000


Short sample (see expanded sample: 6,000 entries: every tenth word 1-60,000)

rank   lemma / word PoS frequency dispersion
7309   attic n 2711 0.91
17311   tearful j 542 0.93
27303   tailgate v 198 0.85
37310   hydraulically r 78 0.83
47309   unsparing j 35 0.83
57309   embryogenesis n 22 0.66


 

2.  Wordlist + genre frequency
  • Overall frequency (+dispersion), as above. But also includes:

  • Frequency in five main genres: spoken, fiction, popular magazine, newspaper, academic

  • Frequency in each of the 40+ sub-genres (e.g. MAG-Sports, NEWS-Financial, ACAD-Medicine)

  • With the frequency data for specific genres and sub-genres, you can create customized wordlists for specific purposes: medical, technology, sports, etc.

  • Excel file; can be printed / copied

  • List sizes: 5,000, 10,000, 20,000, 40,000, 60,000

 

Short sample (see expanded sample: 6,000 entries: every tenth word 1-60,000)
Note 1:  Due to space constraints, in this sample only six of the 40+ sub-genres are shown (M1: MAG-Financial; M2: MAG-Science/Tech; N1: NEWS-Sports; N2: NEWS-Editorial; A1: ACAD-Law/PolSci; A2: ACAD-Medicine).
Note 2:  The green shading for the five main genres highlights those words whose frequency in that genre are at least double what would otherwise be expected (based on genre size).
 

rank lemma / word PoS disp totFreq spok fic mag news acad M1 M2 N1 N2 A1 A2

25083

piglet

n

0.88

239

20

97

54

46

22

10

2

3

3

0

2

25088

woodsman

n

0.70

300

10

176

77

12

25

1

2

1

3

2

0

25090

candied

j

0.87

242

17

49

102

73

1

0

1

2

1

0

0

25093

metacognitive

j

0.69

306

0

0

0

0

306

0

0

0

0

0

0

25107

industry-wide

j

0.89

236

16

2

64

109

45

19

10

2

1

10

6

25108

health-food

j

0.85

246

10

19

154

55

8

6

4

7

1

0

2

25110

posterior

n

0.88

240

6

30

36

27

139

0

5

4

0

0

99

 

 

3.  Wordlist + collocates
  • More than 4,800,000 entries, showing which words occur most frequently with others.

  • 200-300 collocates each for most of the top 20,000-30,000 words, and somewhat fewer for less-common words.

  • Collocates provide useful information on word meaning and usage

  • List sizes: 5,000, 10,000, 20,000, 40,000, 60,000

Short sample (see expanded sample: 45,390 entries; every hundredth word 1-60,000):

nodeID node nodePoS collocate collPoS freq MutInfo preNode postNode % preNode
15349 smolder v still r 76 4.39 74 2 0.97
15349 smolder v fire n 59 6.33 39 20 0.66
15349 smolder v eye n 43 4.41 24 19 0.55
15349 smolder v cigarette n 26 6.93 17 9 0.65
15349 smolder v ash n 15 7.42 5 10 0.33
15349 smolder v ember n 14 10.62 4 10 0.28
15349 smolder v resentment n 14 8.26 2 12 0.14


 

4.  N-grams
  • All 155,000,000+ distinct 3-grams in the corpus. N-grams table linked to "lexicon" table.

  • Can run an unlimited number of queries against the corpus from your own machine

  • More information

 

Short sample (see expanded sample: 194,000 n-grams)
Note: for ease of presentation, the words themselves are displayed in this sample table. In the downloadable files, there are two tables: a lexicon (with a unique wordID, along with word form, lemma, and PoS), and an n-grams table that has the wordID values from the lexicon table (as integer values, for smaller size and faster searching).
 

frequency

word1

word2

word3

1419

much

the

same

461

much

more

likely

432

much

better

than

266

much

more

difficult

235

much

of

the

226

much

more

than

195

much

less

a

194

much

like

a

 

5.  eBook
  • Top 20-30 collocates for each word, grouped by part of speech, as well as synonyms (for most words)

  • Note that this file cannot be edited or printed or copied from

  • List sizes: 5,000, 10,000, 20,000

Short sample (see expanded sample: 2,700 entries; every seventh word 1-20,000):

1421 blow v
noun
  wind, whistle, air, nose, smoke, breeze, face, hair, kiss, head, window, horn, candle, mind,
storm
misc  away, through, across
   out candle, window, breath, air, wind, smoke, knee, tire, match up building, plot, bomb, plane, car, bridge, wind, threaten off steam, head, roof, leg
● whoosh, gust, waft, puff || move, propel, drive, carry
27254 | 0.94 F
19964 bodice n
adj
  black, tight, white, embroidered, fitted, red, blue, gathered, beaded, pleated  noun  dress, skirt, gown, lace, sleeve, ripper, back, satin, silk, waist  verb  embroider, rip, pull, fit, feature, wear, cover, lace, cut, tuck
421 | 0.86 F

 

 

6.  Printed book
  • Top 20-30 collocates for each word, grouped by part of speech.

  • Also contains 31 frequency-based, thematically-oriented lists

  • List size: 5,000 words

Short sample (see expanded sample: every seventh page in the book):