CENI: a Hybrid Framework for Efficiently Inferring Information Networks (1503.04927v1)
Abstract: Nowadays, the message diffusion links among users or websites drive the development of countless innovative applications. However, in reality, it is easier for us to observe the timestamps when different nodes in the network react on a message, while the connections empowering the diffusion of the message remain hidden. This motivates recent extensive studies on the network inference problem: unveiling the edges from the records of messages disseminated through them. Existing solutions are computationally expensive, which motivates us to develop an efficient two-step general framework, Clustering Embedded Network Inference (CENI). CENI integrates clustering strategies to improve the efficiency of network inference. By clustering nodes directly on the timelines of messages, we propose two naive implementations of CENI: Infection-centric CENI and Cascade-centric CENI. Additionally, we point out the critical dimension problem of CENI: instead of one-dimensional timelines, we need to first project the nodes to an Euclidean space of certain dimension before clustering. A CENI adopting clustering method on the projected space can better preserve the structure hidden in the cascades, and generate more accurately inferred links. This insight sheds light on other related work attempting to discover or utilize the latent cluster structure in the disseminated messages. By addressing the critical dimension problem, we propose the third implementation of the CENI framework: Projection-based CENI. Through extensive experiments on two real datasets, we show that the three CENI models only need around 20% $\sim$ 50% of the running time of state-of-the-art methods. Moreover, the inferred edges of Projection-based CENI preserves or even outperforms the effectiveness of state-of-the-art methods.