Guest Post: Looking beyond the star rating – sentiment analysis for restaurant reviews

Contributed by Dillon Robinson. See his website here.

The timing was perfect. I’d just had bad food at a supposedly delicious restaurant, when my brother began to rant about this neat new thing he had recently discovered called “kimono”. He had used kimono to build a web app to grab the restaurant’s UrbanSpoon rating and make fun of the (mediocre) score. It was a small and amusing test of what kimono could do. At the moment I was browsing the restaurant’s Yelp page, and that got me thinking — could I use kimono to pull data from Yelp in a similar way?

I didn’t have a plan, but I wanted to pull in the Yelp review text and see what I could do with it. Yelp can be mysterious in how it sorts, highlights, and even hides user reviews – as all reviews average out to the very shallow form of measurement: the star. The star rarely tells the whole story, so I decided to find out just how bad the reviews actually were, regardless of the score assigned by the reviewer. I needed to review the text first.

The kimono implementation here was pretty simple. First I created a new API from the restaurant’s Yelp page. I did this with a little bit of bias by sorting the reviews from worst to best, so that the negative reviews appeared at the top of my page. I then set my first and only data type to be the body of review text.

1

Next I needed to evaluate this Yelp review text. This is where sentiment analysis – natural language analysis to determine the sentiment of a body of text – comes into play. You can run sentiment analysis on any text data aggregated with kimono. There are many exciting ways of using sentiment analysis, my site just scratches the surface.

Sentiment analysis tools and methodologies analyze combinations of verbs, nouns, and adjectives in context. I browsed a few psychology and language-based educational websites and amassed a list of English-language words (of any type) that have a negative connotation or denotation. This list was 5,007 words long, with an additional 29 custom words that I added myself.

1-5

Then I created a second array, with the words contained in the review text aggregated by kimono. Using jQuery, I first comma and line broke these words to create the new array.

var reviewarray=[];
//split by space, but before that, remove all punctuation
reviewwords = $('#reviews').text().replace(/[.,/#!$%^&*;:{}=_`~()]/g," ").split(' ');

//grab all words from the reviews for array "reviewarray"
$.each(reviewwords,function(i,val){
    if(reviewwords[i].indexOf('') == 0){
        reviewarray.push(reviewwords[i]);  
    }
});

Now that I have two arrays of words — one from the reviews themselves, and one from the list of negative words. Next, I made a third array – a list of words that exist in both lists. This contains our results – the direct negativity found in our target review page. I loop through ReviewArray (words from review text) and for each item, check if that word exiss in the NegativeList array. If there’s a match, I add the word to the BadMatches array.

$.each( reviewarray, function( key, value ){
    var index = $.inArray( value, negativelist );
    if( index != -1 ){
        //BELOW: actions for the meeting point. This index is a matched word.

        //begin match action
        $('#reviews').highlight(negativelist[index]); //use the highlight script to highlight instances of matched word in reviews section
        badmatchcount++;
        badmatches.push(reviewwords[index]); 
        $("#matches").append(" " + negativelist[index] + " ");
        
    }
});

The script mentioned above is Highlight v4 by Johann Burkard. This applies a class to specific words alone within the chunks of text, and allows me to give some visual distinction to these matched words.

Here is our result:

4

The most negative reviews are listed first, so we see a lot more yellow at the top than at the bottom! A summary is also thrown together:

5

I decided to have a bit more mean fun with my data, and created an ad-lib that you play with the words from the BadMatches array:

6

And there you have it! Again, this is only a cursory test — a brief exploration of what can be done with the web data and text analytics. I believe the power of sentiment analysis will become even greater as tools like kimono enable anyone to access massive amounts of data.