Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Enhancing Research Information Systems with Identification of Domain Experts (2404.02921v1)

Published 28 Mar 2024 in cs.DL, cs.HC, and cs.IR

Abstract: Research organisations and their research outputs have been growing considerably in the past decades. This large body of knowledge attracts various stakeholders, e.g., for knowledge sharing, technology transfer, or potential collaborations. However, due to the large amount of complex knowledge created, traditional methods of manually curating catalogues are often out of time, imprecise, and cumbersome. Finding domain experts and knowledge within any larger organisation, scientific and also industrial, has thus become a serious challenge. Hence, exploring an institutions domain knowledge and finding its experts can only be solved by an automated solution. This work presents the scheme of an automated approach for identifying scholarly experts based on their publications and, prospectively, their teaching materials. Based on a search engine, this approach is currently being implemented for two universities, for which some examples are presented. The proposed system will be helpful for finding peer researchers as well as starting points for knowledge exploitation and technology transfer. As the system is designed in a scalable manner, it can easily include additional institutions and hence provide a broader coverage of research facilities in the future.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. A test collection of synthetic documents for training rankers: Chatgpt vs. human experts. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 5311–5315.
  2. A century of science: Globalization of scientific collaborations, citations, and innovations. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 1437–1446.
  3. MOD: metadata for ontology description and publication. In International Conference on Dublin Core and Metadata Applications. 1–9.
  4. Rayees Farooq. 2023. Knowledge management and performance: a bibliometric analysis based on Scopus and WOS data (1988–2021). Journal of Knowledge Management 27, 7 (2023), 1948–1991.
  5. Collecting qualitative data: A field manual for applied research. Sage.
  6. Demiao Lin. 2024. Revolutionizing Retrieval-Augmented Generation with Enhanced PDF Structure Recognition. arXiv preprint arXiv:2401.12599 (2024).
  7. Durgesh Nandini and Gautam Kishore Shahi. 2019. An ontology for transportation system. (2019).
  8. Ya R Nedumov and Sergei D Kuznetsov. 2019. Exploratory search for scientific articles. Programming and Computer Software 45 (2019), 405–416.
  9. Investigating ChatGPT and cybersecurity: A perspective on topic modeling and sentiment analysis. Computers & Security 135 (2023), 103476.
  10. REDI: Towards knowledge graph-powered scholarly information management and research networking. Journal of Information Science 48, 2 (2022), 167–181.
  11. Research data management in academic institutions: A scoping review. PLoS One 12, 5 (2017), e0178261.
  12. Leonard Richardson. 2007. Beautiful soup documentation.
  13. Eric Ries. 2011. The lean startup: How today’s entrepreneurs use continuous innovation to create radically successful businesses. Currency.
  14. Athena Salaba and Lois Mai Chan. 2023. Cataloging and classification: an introduction. Rowman & Littlefield.
  15. A Knowledge Graph Approach for Exploratory Search in Research Institutions. arXiv preprint arXiv:2311.15688 (2023).
  16. Falk Schuetzenmeister. 2010. University research management: An exploratory literature review. (2010).
  17. Gautam Kishore Shahi and William Kana Tsoplefack. 2022. Mitigating Harmful Content on Social Media Using an Interactive User Interface. In International Conference on Social Informatics. Springer, 490–505.
  18. Exploring generative artificial intelligence: A taxonomy and types. (2024).

Summary

  • The paper presents an automated scheme that enhances expert identification accuracy by leveraging publication metadata and LLM-based analysis.
  • The methodology integrates diverse data sources, including university websites and Google Scholar profiles, to overcome coarse manual categorizations.
  • Preliminary results indicate improved expert visibility and the ability to detect emerging research trends, supporting scalable system implementation.

Enhancing Research Information Systems through Automated Identification of Domain Experts

Introduction to the Research Effort

Research institutions are primary nodes in the network of knowledge creation and dissemination. Identifying domain expertise within these institutions has historically been a challenge due to the limitations of manually curating and updating databases of scholars' profiles and outputs. The paper by Gautam Kishore Shahi and Oliver Hummel addresses this challenge by proposing an automated approach to identify scholarly experts based on their publications and potentially their teaching materials.

The Core Challenge

Research organizations and their outputs have grown exponentially, making the manual curation of expert databases impractical, imprecise, and cumbersome. This growth has been paralleled by an increase in the number of stakeholders seeking to leverage this knowledge for collaboration, technology transfer, and knowledge sharing. However, existing Research Information Management Systems (RIMS) often lag in accurately and timely updating researchers' domains, leading to reduced visibility and accessibility of expertise.

Proposed Solution

The authors present an automated scheme built upon a search engine framework, currently implemented across two universities, to identify experts by analyzing the fields of research indicated by their publications and other publicly available materials. This system aims to address the disconnect between the capacity of RIMS to manage research metadata and the needs of stakeholders seeking specific expertise.

Implementation Insights

The implementation process revealed several insights:

  • Data Gathering and Processing: The process involves gathering professors' names from university websites, crawling publication data, and extracting publications' content. This is followed by identifying research areas using a combination of metadata from university web pages, Google Scholar profiles, and content analysis of publications through LLMs like ChatGPT.
  • Challenges with Manual and Automated Data Capture: While manually attributed research areas tend to be coarse-grained, the extraction of research areas through LLMs provides more detailed insights but may risk over-specification.
  • Lessons on Visual Representation of Data: The use of word clouds as a visual aid for representing the spread of expertise in an institution has been instrumental, though it also highlighted the need for better representation methodologies to accommodate for granularity and language diversity.

Preliminary Results

The prototype has successfully demonstrated its capability to identify specific research expertise within the institutions, improving upon the granularity provided by manually curated databases. Further, it highlighted an ancillary benefit of unearthing recent trends within individual publication histories, showcasing the algorithm's potential to keep pace with the rapid evolution of academic expertise domains.

Future Directions

The research outlined several avenues for improvement and expansion:

  • Expansion Across Institutions: Future work will focus on integrating data from additional institutions to validate the system's scalability and general applicability.
  • Data Source Diversification: Incorporating data from various platforms, including Research Gate and DBLP, could enhance the breadth of coverage.
  • Visual and Functional Enhancements: Improving the search interface and the accuracy of visual data representations such as word clouds will be prioritized.
  • Algorithmic Refinement: The potential to refine the granularity and accuracy of expertise identification through advanced LLMs and semantic web technologies offers an exciting trajectory for future research.

Conclusion

This work lays a solid foundation for future developments in the domain of automated expert identification in academic settings. It promises to significantly enhance the accessibility of domain-specific expertise, facilitating collaboration, knowledge transfer, and scholarship. With further refinement and expansion, the proposed system could revolutionize the way institutions manage and share their intellectual capital.

X Twitter Logo Streamline Icon: https://streamlinehq.com