2000 character limit reached
Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks (1708.04358v1)
Published 14 Aug 2017 in cs.CL, cs.IR, and cs.SI
Abstract: We propose a method for embedding two-dimensional locations in a continuous vector space using a neural network-based model incorporating mixtures of Gaussian distributions, presenting two model variants for text-based geolocation and lexical dialectology. Evaluated over Twitter data, the proposed model outperforms conventional regression-based geolocation and provides a better estimate of uncertainty. We also show the effectiveness of the representation for predicting words from location in lexical dialectology, and evaluate it using the DARE dataset.