PROJECT TITLE :
Towards Real-Time, Country-Level Location Classification of Worldwide Tweets - 2017
The increase of interest in using social media as a source for analysis has motivated tackling the challenge of automatically geolocating tweets, given the lack of express location info in the majority of tweets. In distinction to much previous work that has targeted on location classification of tweets restricted to a selected country, here we tend to undertake the task in an exceedingly broader context by classifying world tweets at the country level, which is therefore way unexplored in a real-time scenario. We tend to analyze the extent to that a tweet's country of origin will be determined by making use of eight tweet-inherent features for classification. Furthermore, we use two datasets, collected a year except every other, to research the extent to that a model trained from historical tweets can still be leveraged for classification of recent tweets. With classification experiments on all 217 countries in our datasets, as well as on the high 25 countries, we tend to supply some insights into the simplest use of tweet-inherent options for an accurate country-level classification of tweets. We notice that the utilization of a single feature, like the utilization of tweet content alone-the most widely used feature in previous work-leaves abundant to be desired. Selecting an acceptable combination of each tweet content and metadata will really lead to substantial enhancements of between 20 and fifty percent. We observe that tweet content, the user's self-reported location and the user's real name, all of that are inherent in a tweet and out there in an exceedingly real-time scenario, are notably useful to see the country of origin. We have a tendency to also experiment on the applicability of a model trained on historical tweets to classify new tweets, finding that the choice of a explicit combination of features whose utility does not fade over time will truly cause comparable performance, avoiding the requirement to retrain. However, the problem of achieving accurate classification will increase slightly for countries with multiple commonalities, particularly for English and Spanish speaking countries.
Did you like this research project?
To get this research project Guidelines, Training and Code... Click Here