- The paper presents a system that mines and visualizes computer science scholar networks using a MAG-derived knowledge graph with over 433 million connections.
- It employs a three-tier architecture integrating TITAN, HBase, Redis, and d3.js for efficient data storage, processing, and interactive visualization.
- It features intelligent semantic queries and fuzzy matching along with academic ranking and open API support, enhancing research into scholarly networks.
This paper introduces "Web of Scholars," a system designed to search, mine, and visualize the complex networks connecting scholars, primarily in Computer Science. It tackles the challenge of navigating the rapidly growing volume of scholarly data and uncovering implicit relationships, which are often missed by existing academic search tools focusing mainly on explicit connections like co-authorship and citations.
The core of "Web of Scholars" is a knowledge graph built using data primarily extracted from the Microsoft Academic Graph (MAG). This graph stores information on over 1.7 million scholars, 1.5 million publications, and various types of relationships totaling over 433 million connections.
System Architecture and Implementation:
The system employs a standard three-tier architecture:
- Data Access Layer: Manages data storage and retrieval. It utilizes the TITAN graph database, backed by HBase, for storing the relationship knowledge graph. Redis is used for caching academic rankings to ensure fast retrieval.
- Business Logic Layer: Implements the core functionalities using the Spring MVC framework. It handles data processing, relationship mining, and interaction logic.
- Application Layer: Provides the user interface and visualization. It uses FreeMarker as a template engine, d3.js for rendering interactive network visualizations, and Bootstrap for styling. User queries are handled, supporting both simple searches and intelligent semantic queries (e.g., "Find Bob's advisor"). Fuzzy matching is implemented using HBase's FuzzyRowFilter for auto-completion.
The system is designed for resilience using a distributed setup with Hadoop, HBase, and Zookeeper.
Key Functionalities:
- Relationship Knowledge Graph: Visualizes various scholar networks:
- Collaboration Network: Shows co-authorship links (thickness indicating frequency), geographic distribution of collaborators (using Google Maps API and d3.js), and collaboration trends over time.
- Advisor-Advisee Network: Mines and displays academic lineage. This relies on a machine learning model, Shifu2 [liu2019shifu2], trained on ground truth data (from PhDTree, matched with MAG) to predict advisor-advisee pairs within the larger MAG dataset based on publication metadata and network features. It also includes an advisor recommendation service based on student preferences and feature matching.
- Citation Network: Displays scholar-level citation and co-citation networks, distinguishing the roles of citing/cited scholars (e.g., advisor, advisee, co-author) using visual cues like color.
- Academic Ranking: Ranks scholars based on multiple metrics like collaborator count, advisee count, advisor influence, citation counts, and a "Potential Index."
- Intelligent Query: Supports semantic queries and fuzzy matching for scholar searches.
- Visualization: Uses d3.js to provide interactive visualizations of the knowledge graph components.
- Open API: Provides API access to the relationship data, allowing external users and systems to build upon the platform for applications like reviewer recommendation, team formation, funding analysis, and further research into academic social networks.
Data Processing:
The system processes MAG data focused on Computer Science. It extracts publications, authors, and relevant metadata to build the knowledge graph, identifying collaboration, citation, and, crucially, predicting advisor-advisee links using the trained Shifu2 model.
In essence, "Web of Scholars" presents a practical implementation of a knowledge graph for academic data. It integrates data mining techniques (specifically for advisor-advisee relationship prediction) with graph database storage and web-based visualization to offer enhanced search, analysis, and recommendation capabilities beyond traditional academic search engines. The provision of an open API emphasizes its potential as a foundational tool for further research and application development in the "Science of Science."