Predicting citation counts based on deep neural network learning techniques (1809.04365v3)

Published 12 Sep 2018 in cs.DL, cs.LG, and cs.SI

Abstract: With the growing number of published scientific papers world-wide, the need to evaluation and quality assessment methods for research papers is increasing. Scientific fields such as scientometrics, informetrics and bibliometrics establish quantified analysis methods and measurements for scientific papers. In this area, an important problem is to predict the future influence of a published paper. Particularly, early discrimination between influential papers and insignificant papers may find important applications. In this regard, one of the most important metrics is the number of citations to the paper, since this metric is widely utilized in the evaluation of scientific publications and moreover, it serves as the basis for many other metrics such as h-index. In this paper, we propose a novel method for predicting long-term citations of a paper based on the number of its citations in the first few years after publication. In order to train a citations prediction model, we employed artificial neural networks which is a powerful machine learning tool with recently growing applications in many domains including image and text processing. The empirical experiments show that our proposed method out-performs state-of-the-art methods with respect to the prediction accuracy in both yearly and total prediction of the number of citations.

Authors (2)

Ali Abrishami (2 papers)
Sadegh Aliakbary (8 papers)

Citations (116)

View on Semantic Scholar

Summary

The paper "Predicting citation counts based on deep neural network learning techniques" by Ali Abrishami and Sadegh Aliakbary focuses on developing a method to predict future citation counts of scientific papers using artificial neural networks. Here is a comprehensive summary:

Objective and Importance

The paper addresses the need for predicting the future impact of research papers, an essential task in scientometrics, informetrics, and bibliometrics. Accurate citation count predictions allow for early identification of influential papers, which can aid researchers in setting research directions and institutions in hiring decisions and awarding grants. The authors highlight the citation count as a critical metric not only for individual research papers but also for broader performance metrics like the h-index and impact factor.

Methodology

The authors propose a novel method focused on predicting long-term citation counts based solely on the citation data from the paper's early years, specifically the first three to five years post-publication. This choice is made to keep the problem simple and generalizable without relying on additional information like author details or journal characteristics.

The solution is framed as a regression learning problem using deep learning techniques, more specifically, artificial neural networks. The paper uses Recurrent Neural Networks (RNN), highlighting their effectiveness in handling sequential data inherent in citation timelines. A sequence-to-sequence model architecture is designed, which involves an encoder-decoder structure to capture the pattern of early citation counts and predict future citations. This neural network is trained on a dataset of existing papers to learn what sequences in early citations imply about a paper's long-term impact.

Dataset

The research draws upon citation data from the Web of Science, specifically targeting high-impact journals like Nature, Science, NEJM, Cell, and PNAS. The dataset consists of papers published from 1980 to 2002, allowing the researchers to use historical citation data to train and test the model effectively.

Evaluation

The paper conducts empirical experiments to demonstrate the method's effectiveness, comparing its performance to state-of-the-art methods, including those based on nearest neighbor and clustering techniques. The authors utilize Root Mean Square Error (RMSE) and the coefficient of determination ( $R^2$ ) to measure prediction accuracy. Their findings suggest that the proposed neural network approach yields better prediction accuracy for both yearly and total citation counts compared to traditional models.

Key Findings

Improved Accuracy: The proposed neural network model outperforms existing methods in terms of prediction accuracy, by effectively capturing and utilizing citation patterns.
Model Generality: By focusing only on early citation counts, the model maintains simplicity and generality, making it applicable across different domains where additional detailed metadata might not be available.
Efficiency: The use of neural networks enables efficient training and prediction processes, especially compared to methods that require on-the-fly analysis of large datasets for each query.

Conclusion and Future Work

The research concludes that deep learning techniques, particularly those designed to handle sequential data, are promising for citation prediction tasks. As future directions, the authors suggest expanding the model to include additional features, such as text analysis from paper contents and extending its application to higher-level tasks like predicting author influence metrics, such as future h-indices.

Through this paper, Abrishami and Aliakbary contribute to the embodied practice of leveraging advanced machine learning techniques to extend the capabilities and applications of bibliometric analyses in assessing research impact.

PDF Markdown