How it Works


This website generates made-up words using Markov chains to learn and duplicate patterns seen from existing word lists. When generating a new word, each letter is picked in sequence. If we are using a Markov chain of order 4, that means that each letter is decided from the 4 letters that appear before it. To figure out what the next letter should be, the program looks at an existing word list and looks at all of the times that it has seen those 4 letters appear in a row, along with what letter proceeds them. it then randomly picks one of those letters, with letters used more frequently having a higher probability of being picked.


Example

  1. Suppose the current letters are barn. We look at our source word list and see the following words have the letters "barn": barn barnacle barnacled barnacles barnburner barndoor barney barnful barnstormer barnyard cowbarn. Since the letter "a" occured in 3 out of the 11 words after we saw the letters "barn", that means it has a 3/11, or 27% chance that it gets picked as our next letter. For simplicity's sake, let's say we randomly chose it.
  2. Since we chose "a" for our letter, the previous 4 letters are now arna. We look at our source word list and see that there are 19 words with "arna", including but not limited to: barnacle carnage carnally carnassial incarnation reincarnate and so on. We only saw the letter "g" once, but that still means that it might be chosen as the next letter. Let's randomly pick it.
  3. Now our previous 4 letters are rnag. There is only one word with this letter combination in all of our English word list! That word is carnage. That means that the next letter has to be "e".
  4. Now our previous 4 letters are nage. This sequence is seen 52 times total. For 30 of those instances, it marks the end of the word. There is also a chance that we might pick "a", "d", "i", "m", "n", or "r", since those are also seen after "nage". We'll randomly choose "m", which has a 6% chance of getting picked.
  5. Now our previous 4 letters are agem. This sequence is seen 14 times total. The letter "e" occured 9 times (64%), so let's randomly pick it.
  6. Now our previous 4 letters are geme. This sequence is seen 24 times, but the next letter is always "n". So that will be our next letter.
  7. Now our previous 4 letters are emen. This sequence is very common; it has 235 occurances. 219 of those words have a "t" as the next letter (think of words like management). So let's suppose that "t" is chosen to be next.
  8. Now our previous 4 letters are ment. This sequence is even more common with 698 occurances. The letter "a" is pretty common (16% chance), but the most common outcome is for the word to end at this point since that happened 465 out of the 698 times. So we'll end the word here. Our final word is barnagement!