Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Survey of Location Prediction on Twitter

Published 9 May 2017 in cs.SI and cs.IR | (1705.03172v2)

Abstract: Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.

Citations (204)

Summary

  • The paper systematically categorizes location prediction on Twitter into home, tweet, and mentioned location tasks through a multi-source data integration framework.
  • It evaluates word-centric, location-centric, and geo-topic models to address the challenges posed by tweet brevity and noise.
  • The survey underscores the role of social ties and emerging deep learning techniques in advancing geolocation accuracy in dynamic online environments.

Insights into Location Prediction on Twitter: An Expert Overview

The paper "A Survey of Location Prediction on Twitter" by Xin Zheng, Jialong Han, and Aixin Sun serves as a comprehensive synthesis of methodologies and challenges surrounding location prediction using Twitter data. This survey is pivotal for researchers keen on exploring the depths of inferring geographical locations from data shared on one of the most dynamic social networks. It dissects the task into three primary problem areas: user home location prediction, tweet location prediction, and mentioned location prediction. Each of these tasks leverages Twitter's rich information ecosystem, combining textual content, user networks, and contextual metadata.

The survey brings to light the significance of integrating multiple data sources to tackle the inherent noise and brevity typical of tweets. Each problem domain is analyzed with respect to how effectively it utilizes tweet content, user networks, and contextual information. This paper delineates clear methodological pathways for each task, alongside the challenges unique to Twitter's ecosystem.

Home Location Prediction

Home location prediction is certainly a more stable task within this domain. It is pivotal for enhancing location-based services and targeted content delivery. The survey classifies existing strategy into word-centric and location-centric approaches.

  1. Word-Centric Approaches: These methods focus on mining textual elements within the tweets that indicate locales, employing models such as spatial variation models or employing Gaussian mixture models for word localization. The challenge remains in discerning "local words" from generic ones.
  2. Location-Centric Approaches: These approaches tend to sideline the user textual content, using the heuristics of classifying users into predetermined location categories, often informed by broader geographic or pseudo-document structures for each location.

The survey explores how social ties on Twitter can improve home location prediction, citing models that incorporate both unidirectional and bidirectional follower-following relationships.

Tweet Location Prediction

Tweet location prediction hinges on real-time extraction of a tweet's geographical footprint. The methods overlap somewhat with home location prediction but are differentiated primarily by the granularity and ephemeral nature of the task.

  • Word and Location-Centric Models: There's a similar application of n-gram usage for localization as observed in sequential home location tasks. The location-specific associations may often be limited due to the tweet’s concise nature.
  • Geo-Topic Models: Here, the utilization of topic models incorporating geographical data is highlighted as prominent, blending latent user interests or subjects with geography to infer locations.

The survey emphasizes the limitations of using Twitter's network for tweet location prediction due to the transient nature of tweet interaction versus more permanent location anchors like a person’s home.

Mentioned Location Prediction

Mentioned location prediction comprises recognizing and disambiguating location mentions in tweets. This task is distinct because it focuses more on the content processing than merely inferring the author’s location from contextual clues.

  • Entity Recognition and Disambiguation: Techniques from traditional NLP such as Conditional Random Fields (CRFs) or leveraging gazetteers are rediscussed to suit Twitter’s unique content features.
  • Joint Models: Integrated models that optimize both recognition and disambiguation collectively could potentially mitigate the emission of errors from one stage affecting the subsequent stage.

Implications and Future Work

The studied methodologies have substantial theoretical and practical implications. By enhancing geolocation capabilities, one can expect improvements across various applications from improved user engagement strategies to more precise disaster management responses via social media monitoring.

The survey outlines the potential future directions which include leveraging deep learning techniques to better model the complex relational data of Twitter and incorporating multimedia content, which is becoming prevalent in social media, alongside text-based data.

As online interactions continue to grow both in scope and scale, understanding and improving location prediction methods using dynamic data sources like Twitter will remain a vital endeavor. This survey positions itself as an essential foundation upon which further research in this domain will expand.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.