Project Skeptical

This submission to the 12 hour Mozilla World Of Hack hackathon event on Friday July 22. The general idea of the project was to seek through the world's most reliable news source and look at patterns in the text. These patterns could then be used to analyze a piece of text to determine how similar it is to the Weekly World news or, how skeptical the text is. There are two parts to the setup. There is a suite of python scripts that scrub the Weekly World News archives on Google Books and build a set of the most commonly used words by year. Several sets of data were produced to look at, including a set that where at least one word in a phase must match a word in the English dictionary (to remove most proper nouns). This data is gathered from the word cloud on every Google Books Weekly World News article. This data is dumped into a JSON file (initially planned to be dumped into a HTML 5 local database). This JSON file can be viewed through a jQuery table display. Finally this JSON file was used to analyze how "sketchy" a piece of writing is by breaking down each word and effectively adding up the occurrence of each word in the Weekly World News word cloud database. More work is deserved the data scrubbing process: words like "a" and "the" should be removed. Additionally, more work should be invested in the "sketchy" scoring algorithm: phrases should receive more points.

Weekly World News Explorer

How "sketchy" is a block of text?

Link to Google Books Scrubbing Scripts

Technologies used in display: jQuery, jQuery.dataTables, jQuery.ui

Technologies used in scrubbing: Python, Beautiful Soup (html processing), PyEnchant (spell check)