Topological-collaborative approach for disambiguating authors' names in collaborative networks (1311.1266v2)
Abstract: Concepts and methods of complex networks have been employed to uncover patterns in a myriad of complex systems. Unfortunately, the relevance and significance of these patterns strongly depends on the reliability of the data sets. In the study of collaboration networks, for instance, unavoidable noise pervading author's collaboration datasets arises when authors share the same name. To address this problem, we derive a hybrid approach based on authors' collaboration patterns and on topological features of collaborative networks. Our results show that the combination of strategies, in most cases, performs better than the traditional approach which disregards topological features. We also show that the main factor for improving the discriminability of homonymous authors is the average distance between authors. Finally, we show that it is possible to predict the weighting associated to each strategy compounding the hybrid system by examining the discrimination obtained from the traditional analysis of collaboration patterns. Once the methodology devised here is generic, our approach is potentially useful to classify many other networked systems governed by complex interactions.