Are 140 Characters Enough? A Large-Scale Linkability Study of Tweets (1406.2746v2)
Abstract: Microblogging is a very popular Internet activity that informs and entertains great multitudes of people world-wide via quickly and scalably disseminated terse messages containing all kinds of newsworthy utterances. Even though microblogging is neither designed nor meant to emphasize privacy, numerous contributors hide behind pseudonyms and compartmentalize their different incarnations via multiple accounts within the same, or across multiple, site(s). Prior work has shown that stylometric analysis is a very powerful tool capable of linking product or service reviews and blogs that are produced by the same author when the number of authors is large. In this paper, we explore linkability of tweets. Our results, based on a very large corpus of tweets, clearly demonstrate that, at least for relatively active tweeters, linkability of tweets by the same author is easily attained even when the number of tweeters is large. We also show that our linkability results hold for a set of actual Twitter users who tweet from multiple accounts. This has some obvious privacy implications, both positive and negative.