Verso folio: Diversified Ranking for Large Graphs with Context-Aware Considerations

Published 25 Jul 2016 in cs.IR | (1607.07504v1)

Abstract: This work is pertaining to the diversified ranking of web-resources and interconnected documents that rely on a network-like structure, e.g. web-pages. A practical example of this would be a query for the k most relevant web-pages that are also in the same time as dissimilar with each other as possible. Relevance and dissimilarity are quantified using an aggregation of network distance and context similarity. For example, for a specific configuration of the problem, we might be interested in web-pages that are similar with the query in terms of their textual description but distant from each other in terms of the web-graph, e.g. many clicks away. In retrospect, a dearth of work can be found in the literature addressing this problem taking the network structure formed by the document links into consideration. In this work, we propose a hill-climbing approach that is seeded with a document collection which is generated using greedy heuristics to diversify initially. More importantly, we tackle the problem in the context of web-pages where there is an underlying network structure connecting the available documents and resources. This is a significant difference to the majority of works that tackle the problem in terms of either content definitions, or the graph structure of the data, but never addressing both aspects simultaneously. To the best of our knowledge, this is the very first effort that can be found to combine both aspects of this important problem in an elegant fashion by also allowing a great degree of flexibility on how to configure the trade-offs of (i) document relevance over result-items' dissimilarity, and (ii) network distance over content relevance or dissimilarity. Last but not least, we present an extensive evaluation of our methods that demonstrate the effectiveness and efficiency thereof.