Word frequency data

from the Corpus del Espaņol


 Purchase data 

Overview
Using the data
Compare 100k/60k

100,000 word list
  Samples
  Compare
  FAQ / questions

5,000-60,000 lemma lists
   Samples / formats
   Compare
   Free list (5,000)

Spanish data
Portuguese data

Related sites
  Full-text data 
  Collocates
  N-grams
  WordAndPhrase
  Academic vocabulary
  corpus.byu.edu

Contact us


In addition to frequency lists for English, we also have what we believe are the most accurate frequency lists for Spanish, containing the top 20,000 lemmas / words in the language. The Spanish data is based on the 20 million words from the 1900s in the 100 million word Corpus del Espaņol, which is the only corpus of Spanish that is 1) large 2) balanced across genres (spoken, fiction, newspaper, academic), and 3) which is accurately tagged for part of speech and lemma (which is necessary to create a frequency dictionary).

The data is available in a number of formats. Click on the links for sample files. To order, see the information at the end of this page.
 

Type of data Explanation

Samples

Price
      Acad1 Com2

Word/lemma

The top 20,000 words (grouped by lemma, so salir = salgo, salimos, salieran, etc). You can also obtain the frequency for each individual word form (for salir: salgo, salieran, etc) of each lemma, and you can also have the frequency for the lemma in each of the major genres in the corpus (spoken, fiction, newspaper, and academic).

20,000 lemma list

 

$100

$200

   
By word form
By genre

Add:
$75
$75

Add:
$150
$150

N-grams

The frequency of all two-word (2-gram), three-word (3-gram), or other n-grams strings. With these lists, you can quickly and easily find the frequency of combinations of words across the corpus, without having to use the corpus interface. In addition, you can specify for which words you want n-grams (e.g. top 20,000 lemmas, all NOUN+de+ NOUN sequences, or all words in a customized 30,000 word list that you send to us).

2-grams
3-grams

$100 $200
Synonyms

320,000 synonyms for 29,000 headwords

Synonyms

$75 $150
Other data

If there is other data that you could use (without having access to the full text), please let us know. Examples might be the frequency of each word or phrase in a 30,000 word/phrase list, or the frequency of all synonyms for the top 10,000 lemmas in the corpus.

 

   

Note:
1  = Academic license, 2 = Commercial license

To order data.

1. Download and fill out the non-disclosure agreement (NDA) via the links in the blue columns above. This states that you will not give the data to anyone else outside of your university or company (which also means that you cannot post it on the web). You just need to fill in your name and company (if that is applicable), and then send it back to us as an attachment.

Remember that to receive academic pricing, you must send the NDA from an academic email address (i.e. not Gmail, etc).

2. Once we receive the NDA, we'll send you a request for payment from PayPal.

3. As soon as we receive confirmation of the payment, we'll send you the link to download the data.

Thanks for your interest.