Word Clouds from the Gospels (and Acts)
Sep 30, 2014
Recently, the point was brought up to me that it would be interesting to see word clouds generated from the text of the Bible. Since I had done similar work before, I took it on as a project for fun tonight (using Ruby and R).
A word cloud is a graphic which attempts to show the relationship of words within a text based on how frequently they appear. Those appearing the most are shown in the largest sizes, while color highlights words appearing a similar number of times.
The text I used came from public domain copies of Matthew through Acts (KJV translation). I chose to use a set of stop words available from this site: http://www.ai.mit.edu/projects/jmlr…. This list provided a good base for eliminating the most common words (ie. the, and, but) that wouldn’t otherwise communicate much in the graphic produced by the process. To that list, I added some words that were more specific to this translation: art, cometh, hath, hast, lo, saith, shalt, spake, thee, thou, thy, thine, wilt, ye.
Once the stop words were removed, the frequency for each word was calculated for each book, and then that list was used to generate the images below.
How familiar are you with each of these books? Can you recognize the word fingerprint of each book?
NOTE: I chose to place Luke and Acts adjacent below since Luke is the author of both books.