Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Identifying Data Noises, User Biases, and System Errors in Geo-tagged Twitter Messages (Tweets) (1712.02433v1)

Published 6 Dec 2017 in cs.SI

Abstract: Many social media researchers and data scientists collected geo-tagged tweets to conduct spatial analysis or identify spatiotemporal patterns of filtered messages for specific topics or events. This paper provides a systematic view to illustrate the characteristics (data noises, user biases, and system errors) of geo-tagged tweets from the Twitter Streaming API. First, we found that a small percentage (1%) of active Twitter users can create a large portion (16%) of geo-tagged tweets. Second, there is a significant amount (57.3%) of geo-tagged tweets located outside the Twitter Streaming API's bounding box in San Diego. Third, we can detect spam, bot, cyborg tweets (data noises) by examining the "source" metadata field. The portion of data noises in geo-tagged tweets is significant (29.42% in San Diego, CA and 53.47% in Columbus, OH) in our case study. Finally, the majority of geo-tagged tweets are not created by the generic Twitter apps in Android or iPhone devices, but by other platforms, such as Instagram and Foursquare. We recommend a multi-step procedure to remove these noises for the future research projects utilizing geo-tagged tweets.

Citations (21)

Summary

We haven't generated a summary for this paper yet.