Neural Embeddings of Scholarly Periodicals Reveal Complex Disciplinary Organizations (2001.08199v2)

Published 22 Jan 2020 in cs.DL, cs.SI, and physics.soc-ph

Abstract: Understanding the structure of knowledge domains is one of the foundational challenges in science of science. Here, we propose a neural embedding technique that leverages the information contained in the citation network to obtain continuous vector representations of scientific periodicals. We demonstrate that our periodical embeddings encode nuanced relationships between periodicals as well as the complex disciplinary and interdisciplinary structure of science, allowing us to make cross-disciplinary analogies between periodicals. Furthermore, we show that the embeddings capture meaningful "axes" that encompass knowledge domains, such as an axis from "soft" to "hard" sciences or from "social" to "biological" sciences, which allow us to quantitatively ground periodicals on a given dimension. By offering novel quantification in science of science, our framework may in turn facilitate the study of how knowledge is created and organized.

Citations (51)

View on Semantic Scholar

Summary

The paper utilizes neural embedding techniques on a large citation network to create dense vector representations for scholarly periodicals, mapping complex disciplinary structures.
These neural embeddings accurately capture periodical similarity, align with expert perceptions, and improve the prediction of disciplinary categories compared to traditional methods.
The technique enables quantitative analyses like projecting periodicals onto conceptual dimensions and exploring cross-disciplinary analogies through vector arithmetic.

Neural Embeddings of Scholarly Periodicals: Insights into Disciplinary Structures

This paper presents a neural embedding technique designed to understand the complex organizational structures of scientific periodicals through the use of citation networks. The technique provides continuous vector representations of periodicals, which encode nuanced relationships and reveal both disciplinary and interdisciplinary structures within the scientific community. This approach surpasses traditional classification systems by enabling quantitative analyses, such as cross-disciplinary analogies and the identification of axes representing conceptual dimensions in science.

Methodology

The authors utilized a network embedding method based on the DeepWalk and node2vec models. These models adapt the word2vec framework to network contexts, treating citation paths as analogous to sentences, with periodicals as words within those sentences. The embedding process employed a large dataset from the Microsoft Academic Graph, encompassing 53 million papers and 402 million citation pairs, resulting in a representation of over 20,000 periodicals in 100-dimensional space.

Validation and Results

The effectiveness of the periodical embeddings was validated across several tasks:

Similarity Analysis: The embeddings demonstrated superior capability in capturing periodical similarities compared to traditional citation vector and Jaccard similarity models. The dense embedding approach offered improved interpretability in distinguishing closely related journals within the same disciplines and sub-disciplines.
Expert Alignment: A survey of domain experts confirmed the embeddings' ability to align with perceptions of topical similarity among journals. The neural embeddings provided a computationally efficient alternative to traditional methods while maintaining comparable accuracy.
Categorical Predictions: Periodicals' discipline categories were predicted with greater accuracy using the new embedding model compared to sparse vector models, highlighting the embeddings' ability to capture latent relationships beyond direct citation counts.
Disciplinary Mapping: The t-SNE visualizations of the embedding space revealed intricate disciplinary boundaries and clustered interdisciplinary fields like neuroimaging and parasite research, which traditional categorizations failed to capture adequately.

Conceptual Dimensions and Analogies

The paper illustrates the use of embeddings to derive and explore conceptual scientific dimensions. The authors constructed axes between general disciplines, such as "soft" versus "hard" sciences, and social versus life sciences. Periodicals were projected onto these axes, revealing expected hierarchies and nuanced arrangements among disciplines.

Furthermore, analogy graphs constructed from periodical embeddings demonstrated the potential for cross-disciplinary exploration. These graphs exploited vector arithmetic, akin to analogies in word embeddings, to systematically navigate inter-discipline relationships.

Implications and Future Directions

The proposed embedding method presents substantial implications for the field of science of science. It offers a refined, data-driven approach to understanding and navigating the scientific landscape, potentially facilitating insights into the creation and evolution of knowledge. The ability to quantitatively represent interdisciplinary connections could enhance collaboration across fields and inform better science policy and funding decisions.

Future developments may focus on integrating temporal dynamics to capture the evolution of scientific domains over time or extending embedding methods to larger datasets. Addressing limitations such as data quality reliance and exploring alternative embedding techniques remain critical areas for continued research. This work underscores the potential of neural embeddings as powerful tools to advance our comprehension of scholarly knowledge structuring.

Related Papers

YouTube

Show All Videos