Papers
Topics
Authors
Recent
Search
2000 character limit reached

Twitter-based traffic information system based on vector representations for words

Published 4 Dec 2018 in cs.IR, cs.LG, and stat.ML | (1812.01199v1)

Abstract: Recently, researchers have shown an increased interest in harnessing Twitter data for dynamic monitoring of traffic conditions. Bag-of-words representation is a common method in literature for tweet modeling and retrieving traffic information, yet it suffers from the curse of dimensionality and sparsity. To address these issues, our specific objective is to propose a simple and robust framework on the top of word embedding for distinguishing traffic-related tweets against non-traffic-related ones. In our proposed model, a tweet is classified as traffic-related if semantic similarity between its words and a small set of traffic keywords exceeds a threshold value. Semantic similarity between words is captured by means of word-embedding models, which is an unsupervised learning tool. The proposed model is as simple as having only one trainable parameter. The model takes advantage of outstanding merits, which are demonstrated through several evaluation steps. The state-of-the-art test accuracy for our proposed model is 95.9%.

Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.