Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AceMap: Knowledge Discovery through Academic Graph (2403.02576v2)

Published 5 Mar 2024 in cs.DL, cs.LG, and cs.SI

Abstract: The exponential growth of scientific literature requires effective management and extraction of valuable insights. While existing scientific search engines excel at delivering search results based on relational databases, they often neglect the analysis of collaborations between scientific entities and the evolution of ideas, as well as the in-depth analysis of content within scientific publications. The representation of heterogeneous graphs and the effective measurement, analysis, and mining of such graphs pose significant challenges. To address these challenges, we present AceMap, an academic system designed for knowledge discovery through academic graph. We present advanced database construction techniques to build the comprehensive AceMap database with large-scale academic entities that contain rich visual, textual, and numerical information. AceMap also employs innovative visualization, quantification, and analysis methods to explore associations and logical relationships among academic entities. AceMap introduces large-scale academic network visualization techniques centered on nebular graphs, providing a comprehensive view of academic networks from multiple perspectives. In addition, AceMap proposes a unified metric based on structural entropy to quantitatively measure the knowledge content of different academic entities. Moreover, AceMap provides advanced analysis capabilities, including tracing the evolution of academic ideas through citation relationships and concept co-occurrence, and generating concise summaries informed by this evolutionary process. In addition, AceMap uses machine reading methods to generate potential new ideas at the intersection of different fields. Exploring the integration of LLMs and knowledge graphs is a promising direction for future research in idea evolution. Please visit \url{https://www.acemap.info} for further exploration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Derek J De Solla Price. Networks of scientific papers: The pattern of bibliographic references indicates the nature of the scientific research front. Science, 149(3683):510–515, 1965.
  2. Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. Journal of the association for information science and technology, 66(11):2215–2222, 2015.
  3. Exponential growth in the number of items in the wos. ISSI Newsletter, 16(2):32–38, 2020.
  4. Dataexpo: A one-stop dataset service for open science research. In Companion Proceedings of the ACM Web Conference 2023, WWW ’23 Companion, page 32–36, New York, NY, USA, 2023. Association for Computing Machinery.
  5. National Science Board, National Science Foundation. Publications Output: U.S. and International Comparisons. Science and Engineering Indicators NSB-2021-4, Alexandria, VA, 2021.
  6. Paul Ginsparg. Arxiv at 20. Nature, 476(7359):145–147, 2011.
  7. biorxiv: the preprint server for biology. BioRxiv, page 833400, 2019.
  8. Open access and global participation in science. Science, 323(5917):1025–1025, 2009.
  9. Google scholar citations and google web/url citations: A multi-discipline exploratory analysis. Journal of the American Society for Information Science and Technology, 58(7):1055–1065, 2007.
  10. Michael Ley. Dblp: some lessons learned. Proceedings of the VLDB Endowment, 2(2):1493–1500, 2009.
  11. Latent tree models for hierarchical topic detection. Artificial Intelligence, 250:105–124, 2017.
  12. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  13. Knowledgeshovel: An ai-in-the-loop document annotation system for scientific knowledge base construction. arXiv preprint arXiv:2210.02830, 2022.
  14. Deepshovel: An online collaborative platform for data extraction in geoscience literature with ai assistance. arXiv preprint arXiv:2202.10163, 2022.
  15. Geodeepshovel: A platform for building scientific database from geoscience literature with ai assistance. Geoscience Data Journal, 10(4):519–537, 2023.
  16. Detectron2. https://github.com/facebookresearch/detectron2, 2019.
  17. Tablebank: A benchmark dataset for table detection and recognition, 2019.
  18. Anthony Kay. Tesseract: An open-source optical character recognition engine. Linux J., 2007(159):2, jul 2007.
  19. Geoimagecut: A toolkit for image cut from geoscience literature. In Proceedings of the ACM Turing Award Celebration Conference-China 2023, pages 96–97, 2023.
  20. Towards controlled table-to-text generation with scientific reasoning. arXiv preprint arXiv:2312.05402, 2023.
  21. Tablebank: Table benchmark for image-based table detection and recognition. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1918–1925, 2020.
  22. Pdffigures 2.0: Mining figures from research papers. In 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL), pages 143–152, 2016.
  23. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear, 2017.
  24. A review of yolo algorithm developments. Procedia Computer Science, 199:1066–1073, 2022.
  25. Crnn: a joint neural network for redundancy detection. In 2017 IEEE international conference on smart computing (SMARTCOMP), pages 1–8. IEEE, 2017.
  26. Forceatlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software. PLOS ONE, 9(6):e98679, 2014.
  27. Thomas M. J. Fruchterman and Edward M. Reingold. Graph drawing by force-directed placement. Software: Practice and Experience, 21(11):1129–1164, 1991.
  28. Yifan Hu. Efficient and high quality force-directed graph drawing. Mathematica Journal, 10:37–71, 01 2005.
  29. Vsan: A new visualization method for super-large-scale academic networks. Frontiers of Computer Science, 18(1):181701, 2024.
  30. Quantifying knowledge from the perspective of information structurization. PLOS ONE, 18(1):1–16, 01 2023.
  31. Structural information and dynamical complexity of networks. IEEE Transactions on Information Theory, 62(6):3290–3339, 2016.
  32. GM Peter Swann. The functional form of network effects. Information economics and policy, 14(3):417–429, 2002.
  33. Ethernet: Distributed packet switching for local computer networks. Communications of the ACM, 19(7):395–404, 1976.
  34. David P Reed. The law of the pack. Harvard business review, 79(2):23–24, 2001.
  35. DDE Scholar. https://ddescholar.acemap.info/. Accessed: April 25, 2023.
  36. Ideareader: A machine reading system for understanding the idea flow of scientific publications. arXiv preprint arXiv:2209.13243, 2022.
  37. Acemap: A novel approach towards displaying relationship among academic literatures. In Proceedings of the 25th international conference companion on world wide web, pages 437–442, 2016.
  38. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.
  39. Mrt: Tracing the evolution of scientific publications. IEEE Transactions on Knowledge and Data Engineering, 2021.
  40. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, 2019.
  41. Prone: Fast and scalable network representation learning. In IJCAI, volume 19, pages 4278–4284, 2019.
  42. Semi-supervised graph clustering: a kernel approach. Machine learning, 74(1):1–22, 2009.
  43. Automatic labeling of multinomial topic models. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 490–499, 2007.
  44. Text summarization with pretrained encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3730–3740, 2019.
  45. Capturing relations between scientific papers: An abstractive model for related work section generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6068–6077, 2021.
  46. Scibert: A pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620, 2019.
  47. Pretrained language models for sequential sentence classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3693–3699, 2019.
  48. A deep learning classifier for sentence classification in biomedical and computer science abstracts. Neural Computing and Applications, 32(11):6793–6807, 2020.
  49. Exploring and verbalizing academic ideas by concept co-occurrence. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13001–13027, Toronto, Canada, July 2023. Association for Computational Linguistics.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com