Out of vocabulary words decrease, running texts prevail and hashtags coalesce: Twitter as an evolving sociolinguistic system (1509.05096v1)
Abstract: Twitter is one of the most popular social media. Due to the ease of availability of data, Twitter is used significantly for research purposes. Twitter is known to evolve in many aspects from what it was at its birth; nevertheless, how it evolved its own linguistic style is still relatively unknown. In this paper, we study the evolution of various sociolinguistic aspects of Twitter over large time scales. To the best of our knowledge, this is the first comprehensive study on the evolution of such aspects of this OSN. We performed quantitative analysis both on the word level as well as on the hashtags since it is perhaps one of the most important linguistic units of this social media. We studied the (in)formality aspects of the linguistic styles in Twitter and find that it is neither fully formal nor completely informal; while on one hand, we observe that Out-Of-Vocabulary words are decreasing over time (pointing to a formal style), on the other hand it is quite evident that whitespace usage is getting reduced with a huge prevalence of running texts (pointing to an informal style). We also analyze and propose quantitative reasons for repetition and coalescing of hashtags in Twitter. We believe that such phenomena may be strongly tied to different evolutionary aspects of human languages.