Sentiment Analysis with Deep Learning and Traditional Approaches: An Ensemble Modeling Example

Image
image downloaded from: https://beyondphilosophy.com/a-sentiment-analysis-of-sentiment-analysis-some-gobbledygook-some-bright-spots-and-a-nice-looking-dashboard/ In this article, we w ill use a simple text classification dataset to demonstrate how sentiment analysis can be done with both traditional text mining approaches and deep learning approaches. We will also compare the performance of the two modeling strategies and develop an ensemble model that maximizes prediction accuracy. The data is cited from de Freitas, Nando, and Misha Denil. "From Group to Individual Labels using Deep Features." (2015).  We will cover: Develop a LSTM deep learning model Sentiment analysis with polarity scores  Comparison and ensemble modeling Before we start, let's take a look at the data. The data contains 3,000 reviews labeled with positive and negative sentiments extracted from Amazon, IMDb, and Yelp.  The head of the data looks like this: So there is no way for m...

Example: Hot topics on world news, 2017


This article is a continuous of the previous post, Reddit text mining and visualization with R Shiny. In this article, we will introduce techniques on exploring topics on Reddit, worldnews board.

Take a look at the data

First, we load the data and select date range to 2017-01-01 to 2017-05-01, which is the nearest date I collected.
We have 8785 posts, with medians of 9 points and 3 comments. The post-over-time is stable. About 500 posts per week.

We see how the posts are made by plotting the author-post barchart.
There's one guy made over 300 posts while others contributed less than 100. Most people made no more than 10.

Find keywords

Now we plot the keywords in a barchart base on their tf-idf.
There're important keywords like "trump", "china", "korea". But we also see adjectives like "north", "south" that should be connected with a noun. 

We plot the bigrams with high frequencies.
Combining both terms and bigrams, we get the following list of keywords.

north korea
donald trump
south korea
marine le pen
kim jong
Saudi arabia
human rights
travel ban
south china
climate change
Iraqi forces
israel
syria
turkey
iran
rassia

From keywords to topics

To dig deeper on each keywords and find the stories behind, we can narrow down our data with filter.

Here, we search only posts containing both "korea" and "north", then plot bigrams cloud.

We see topics like "kim jong nam" and "ballistic missile test". Donald Trump is also shown on the side.

We can also do word pairwise correlation with unspecified keywords. So that all top pairs will show.
The result is similar, we see "jong un", "jong nam", "ballistic missile".

We do the same on "le pen". Note than when searching by keywords, "le" and "pen" should be attached with spaces like " le " and " pen ". Otherwise words like "missile", since it includes "le" will be mis-selected.
There're pairs like "front national", "party leader", "french presidental election", and "candidate marine le pen". We know that this is a political news about election in France.

A More general approach

Beside finding topics with pre-screened keywords, we can also look for topics by word correlation. 
Here, we want to explore topics about attacks. This table shows the top keywords related to attack. We see "istanbul", "nightclub", "terrorist" and so on.

Now we use topic modeling algorithm LDA to find topics. We remove "trump", "says" and "new" since they're everywhere.
In topic 1, there're "russia", "turkey", "attack". We might guess it's about military conflicts in west Asia. Topic 2 has "donald", "US", "iran", "saudi", and "ban". It could be travel ban on Islamic countries. Topic 3 has "north", "korea", and "missile". It's clearly about north Korea's ballistic missile test. Topic 5 has "eu", "brexit", "minister", we might guess it's about European politics. Topic 4 and 6 are not clear with too many unspecific words. 

Conclusion

Though not perfect, the tools can help us screen, model and visualize keywords and topics in thousands of titles. This example only demonstrates some skills that can be easily implemented. There're more and more can be explored. Keep in mind that text mining requires background knowledge on the topics. The more a person know about the data, the more insight he can find out.

Comments

Popular posts from this blog

Reddit text mining and visualization with R Shiny

Sentiment Analysis with Deep Learning and Traditional Approaches: An Ensemble Modeling Example

Text Generator with LSTM Recurrent Neural Network with Python Keras.