Class Exercise 5 uses the Taco Bell Tweets dataset, a collection of 9413 tweets that mention "Taco Bell" between January 24 to 31 2011.
The first thing that needs to be done is cleaning up of the raw data. The column for source, author, title, item link, author link, sentiment and tags were deleted. This leaves us with the date/time and body. After using the tag cloud function in Many Eyes, we have decided to focus on those tweets containing the word "fake". This leaves us with 137 tweets. Next, we categorised the tweets based on day i.e. 1 to 8. For the body, the tweets were changed to the numeric number 1, to signify 1 count of tweet containing the word "fake". The cleaned up data looks like the following.
From the visualisation, it can be seen that Day 3 has the most number of tweets using the word "fake". After going down for the next 2 days, it rises again on Day 6. There might be something that happens during Day 3 and Day 6 that causes the increase in the number of tweets using the word "fake".
No comments:
Post a Comment