This is a lexicographic ressource created and made available by researchers of the University Paris Descartes (Paris 5). Here is the website : http://www.lexique.org/telLexique.php. It includes about 150,000 words or expressions in French and was made from old texts and film subtitles.
- The licence is « creative commons » : CC BY NC SA.
- The number of entry is becoming quite large : about 150,000.
- About 5,000 listed lemma directing to about 2,000 nonexistant entries. We discovered this issue after having changed their data into a chart and used them via a database management software, and made a self-joint (a table joint on itself) to update the links between the entries and lemma.
- Too many Latin expressions because of ancient texts (often religious). Here is an obvious example : does « habemus papan » has anything to do in a French vocabulary list ?
- Many mistakes such as « m’ », « t’ » and « s’ » classified in different grammatical categories, some of which are harebrained.
- A great deal of bizarre expressions can be found in the list.
- Not enough grammatical categories.
The words list is becoming quite large but most of the data is sometimes irrelevant and more intended for the use of researchers.
The amount of mistakes and inconsistent data, confronted to the number of words, even without any annex information, made us quickly give up the data of this project.