How it Works
This website generates made-up words using Markov chains to learn and duplicate patterns
seen from existing word lists. When generating a new word, each letter is picked in sequence. If we are using a Markov chain of order 4, that
means that each letter is decided from the 4 letters that appear before it. To figure out what the next letter should be, the program looks at
an existing word list and looks at all of the times that it has seen those 4 letters appear in a row, along with what letter proceeds them. it
then randomly picks one of those letters, with letters used more frequently having a higher probability of being picked.
Example
-
Suppose the current letters are barn. We look at our source word list and see the following words have the letters "barn":
barn
barnacle
barnacled
barnacles
barnburner
barndoor
barney
barnful
barnstormer
barnyard
cowbarn.
Since the letter "a" occured in 3 out of the 11 words after we saw the letters "barn", that means it has a 3/11, or 27% chance that it
gets picked as our next letter. For simplicity's sake, let's say we randomly chose it.
-
Since we chose "a" for our letter, the previous 4 letters are now arna. We look at our source word list and see that there are 19 words with "arna",
including but not limited to:
barnacle
carnage
carnally
carnassial
incarnation
reincarnate and so on.
We only saw the letter "g" once, but that still means that it might be chosen as the next letter. Let's randomly pick it.
-
Now our previous 4 letters are rnag. There is only one word with this letter combination in all of our English word list!
That word is carnage.
That means that the next letter has to be "e".
-
Now our previous 4 letters are nage. This sequence is seen 52 times total. For 30 of those instances, it marks the end of the word.
There is also a chance that we might pick "a", "d", "i", "m", "n", or "r", since those are also seen after "nage". We'll randomly choose "m",
which has a 6% chance of getting picked.
-
Now our previous 4 letters are agem. This sequence is seen 14 times total. The letter "e" occured 9 times (64%), so let's randomly
pick it.
-
Now our previous 4 letters are geme. This sequence is seen 24 times, but the next letter is always "n". So that will be our next letter.
-
Now our previous 4 letters are emen. This sequence is very common; it has 235 occurances. 219 of those words have a "t" as the next
letter (think of words like management). So let's suppose that "t" is chosen to be next.
-
Now our previous 4 letters are ment. This sequence is even more common with 698 occurances. The letter "a" is pretty common
(16% chance), but the most common outcome is for the word to end at this point since that happened 465 out of the 698 times. So we'll end
the word here. Our final word is barnagement!
-
We skipped talking about how to get the first 4 letters. Fortuately it's very simple. If we don't have 4 letters chosen yet, we look at words
that start with the letters we do have. So the first letter will be chosen based on what letters are commonly seen at the start of a word. The
second letter will be chosen from all of the words that start with the first letter we chose previously. The third letter will look at all words
that start with the first two letters, and so on.
-
Sometimes it happens that we create a word that's actually in our dictionary. In this case, we just throw it out and start over.