2 * English words for generation of easy to memorize random passphrases.
3 * This list comes from the MakePass passphrase generator developed by
4 * Dianelos Georgoudis <dianelos at tecapro.com>, which was announced on
5 * sci.crypt on 1997/10/24. Here's a relevant excerpt from that posting:
7 * > The 4096 words in the word list were chosen according to the following
9 * > - each word must contain between 3 and 6 characters
10 * > - each word must be a common English word
11 * > - each word should be clearly different from each other
12 * > word, orthographically or semantically
14 * > The MakePass word list has been placed in the public domain
16 * At least two other sci.crypt postings by Dianelos Georgoudis also state
17 * that the word list is in the public domain, and so did the web page at:
19 * https://web.archive.org/web/%2a/http://www.tecapro.com/makepass.html
21 * which existed until 2006 and is available from the Wayback Machine as of
22 * this writing (March 2010). Specifically, the web page said:
24 * > The MakePass word list has been placed in the public domain. To download
25 * > a copy click here. You can use the MakePass word list for many other
28 * "To download a copy click here" was a link to free/makepass.lst, which is
29 * currently available via the Wayback Machine:
31 * https://web.archive.org/web/%2a/http://www.tecapro.com/free/makepass.lst
33 * Further lists of common English words were appended to the end to allow for
34 * subsequent removal of "inappropriate" words from the initial list.
36 * Even though the original description of the list stated that "each word
37 * must contain between 3 and 6 characters", there were two 7-character words.
38 * These have been removed.
40 * Many "inappropriate" words have then been moved to near the end of list, so
41 * that they're not used for generated passphrases.
43 * The code in passwdqc_check.c and passwdqc_random.c makes the following
44 * assumptions about this list:
46 * - the first 4096 words are for random passphrase generation, and there are
47 * at least this many words present;
48 * - the words are of up to 6 characters long;
49 * - although some words may contain capital letters, no two words differ by
50 * the case of characters alone (e.g., converting the list to all-lowercase
51 * would yield a list of as many unique words);
52 * - the words contain alphabetical characters only;
53 * - if an entire word on this list matches the initial substring of other
54 * word(s) on the list, it is likely placed immediately before those words
55 * (e.g., "bake", "baker", "bakery"), which speeds up the "word-based" check.
57 * Additionally, the default minimum passphrase length of 11 characters
58 * specified in passwdqc_parse.c has been chosen such that a passphrase
59 * consisting of any three words from this list with two separator
60 * characters will pass the minimum length check. In other words, this
61 * default assumes that no word is shorter than 3 characters.
64 #include "wordset_4k.h"
66 const char _passwdqc_wordset_4k
[][WORDSET_4K_LENGTH_MAX
] = {
4034 "Africa", /* we had "Europe" */
4036 "South", /* we had the other 3, but not this one */
4037 "anyone", /* some previously missed very common English words */
4046 "admin", /* words common in computing */
4052 "aisle", /* BIP-0039 */
4143 "cookie", /* more previously missed common English words */
4179 "arms", /* gutenberg-19xx-lowercase-words-1000plus.txt top 1000 */
4284 "asking", /* gutenberg-19xx-lowercase-words-1000plus.txt top 2000 */
4407 "acted", /* gutenberg-19xx-lowercase-words-1000plus.txt top 3000 */
4527 "acres", /* gutenberg-19xx-lowercase-words-1000plus.txt top 4000 */
4668 "agents", /* the rest of gutenberg-19xx-lowercase-words-1000plus.txt */
4745 "affix", /* eff_short_wordlist_1.txt */
5109 "abacus", /* eff_large_wordlist.txt */
6065 "abyss", /* eff_short_wordlist_2_0.txt */
6106 "bush", /* these two were merely words, now may be taken as referring to specific people */
6108 "church", /* religious references (more of these are among the capitalized words below) */
6114 "black", /* races, skin colors, and ethnicities */
6119 "Afghan", /* assorted capitalized words undesirable in passphrases, including: */
6120 "Allah", /* religious references */
6121 "Anglo", /* ethnicities, nationalities (but country names stay), other categories of people */
6124 "Bach", /* specific people */
6152 "Madame", /* has an inappropriate meaning */
6155 "Mrs", /* abbreviation */
6161 "Ritz", /* brands (the EFF lists also include some, but might not be reached) */
6165 "Sony", /* "Amazon" stayed for its original meaning */
6170 "abort", /* generally politically incorrect */
6174 "booze", /* some references to drugs and alcohol (but specific drink names stay above) */
6181 "kill", /* violence */
6186 "arouse", /* obscene, intimate anatomy, sexually explicit or suggestive */
6232 "thrust", /* Ubuntu bug 1407629 claims "Probe!thrust6scorn" is inappropriate */
6234 "uterus", /* "womb" is on EFF lists, so assumed OK - more figurative than anatomical */
6238 "advice", /* too similar */
6240 "armor", /* multiple spellings */
6246 "cheque", /* or "check", which is preserved above for different meaning */
6249 "email", /* or "e-mail" */
6252 "flavor", /* or "flavour" */
6257 "harbor", /* or "harbour" */
6268 "learnt", /* or "learned" */
6275 "whisky", /* or "whiskey" */
6276 "yoyo", /* or "yo-yo" */
6277 "etc", /* Latin, not English */
6278 #if 0 /* obsolete, dialectal, and non-words from gutenberg-19xx-lowercase-words-1000plus.txt */
6292 "html", /* obvious noise from how gutenberg-19xx-lowercase-words-1000plus.txt was created */
6296 "" /* end of list marker */