The following are samples of the data
that you can get from the new 100,000 word list of English. If you
get the full 100,000 word list as an Excel file (see samples:
xlxs,
xls), you can
sort the full 100,000 word list however you want -- to sort and limit by part of
speech, overall frequency in COCA, COHA, the BNC, or SOAP, or the
frequency in any of the genres of COCA or the BNC (e.g. academic or
newspapers). You might also take a
look at 5,000
randomly-selected words from the list (every twentieth word, 1 to
100,000) to check the accuracy of the list.
|
Sample list |
Description (click to download) |
# entries |
1 |
Rank-ordered |
Every 100 words, 1-100,000 |
1008 |
2 |
Alphabetical |
All words starting with [V-] |
1212 |
3 |
Part of speech |
Simple past tense forms (every fifth
entry) |
Top 1000 |
4 |
Genres (COCA: Academic) |
Verbs that are 50% more frequent in COCA Academic (per million
words) than overall |
1188 |
5 |
Dialects (COCA / BNC) |
Nouns that are at least 10 times as frequent in COCA (overall) than
in the BNC (random entries) |
Top 1000 |
6 |
New words (COCA / COHA) |
Nouns that are at least 10 times as frequent in COCA (1990-2012)
than in COHA, 1950-1989 (random entries) |
Top 1000 |
7 |
Informal words |
Adjectives that are at least twice as frequent in SOAP (Soap Operas)
than in COCA (overall) |
255 |
Note: no entries have been removed from
these sample lists. In other words, the full list (words 1-100,000) has
not been "cleaned up" for these sample lists, and as a result, the
accuracy of the sample lists is indicative of the accuracy of the full
list. |
|