Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HINTs: Sensemaking on large collections of documents with Hypergraph visualization and INTelligent agents (2403.02752v1)

Published 5 Mar 2024 in cs.HC

Abstract: Sensemaking on a large collection of documents (corpus) is a challenging task often found in fields such as market research, legal studies, intelligence analysis, political science, computational linguistics, etc. Previous works approach this problem either from a topic- or entity-based perspective, but they lack interpretability and trust due to poor model alignment. In this paper, we present HINTs, a visual analytics approach that combines topic- and entity-based techniques seamlessly and integrates LLMs as both a general NLP task solver and an intelligent agent. By leveraging the extraction capability of LLMs in the data preparation stage, we model the corpus as a hypergraph that matches the user's mental model when making sense of the corpus. The constructed hypergraph is hierarchically organized with an agglomerative clustering algorithm by combining semantic and connectivity similarity. The system further integrates an LLM-based intelligent chatbot agent in the interface to facilitate sensemaking. To demonstrate the generalizability and effectiveness of the HINTs system, we present two case studies on different domains and a comparative user study. We report our insights on the behavior patterns and challenges when intelligent agents are used to facilitate sensemaking. We find that while intelligent agents can address many challenges in sensemaking, the visual hints that visualizations provide are necessary to address the new problems brought by intelligent agents. We discuss limitations and future work for combining interactive visualization and LLMs more profoundly to better support corpus analysis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. All The News. https://components.one/datasets/all-the-news-2-news-articles-dataset.
  2. Comparative evaluation of bipartite, node-link, and matrix-based network representations. IEEE Transactions on Visualization and Computer Graphics, 29(1):896–906, 2022.
  3. Serendip: Topic model-driven visual exploration of text corpora. In 2014 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 173–182, 2014. doi: 10 . 1109/VAST . 2014 . 7042493
  4. Large-scale evaluation of topic models and dimensionality reduction methods for 2d text spatialization. arXiv preprint arXiv:2307.11770, 2023.
  5. Gospermap: Using a gosper curve for laying out hierarchical data. IEEE transactions on visualization and computer graphics, 19(11):1820–1832, 2013.
  6. ReFinED: An efficient zero-shot-capable approach to end-to-end entity linking. In NAACL, 2022.
  7. A systematic literature review of user trust in ai-enabled systems: An hci perspective. International Journal of Human–Computer Interaction, pp. 1–16, 2022.
  8. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023, 2023.
  9. Through the looking glass: insights into visualization pedagogy through sentiment analysis of peer review text. IEEE Computer Graphics and Applications, 41(6):59–70, 2021.
  10. P. P. . F. T. Bogumił Kamiński. Community detection algorithm using hypergraph modularity. In Complex Networks & Their Applications IX: Volume 1, Proceedings of the Ninth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2020, pp. 152–163. Springer, 2021.
  11. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  12. Cartolabe: A web-based scalable visualization of large document collections. IEEE Computer Graphics and Applications, 41(2):76–88, 2020.
  13. Solarmap: Multifaceted visual analytics for topic exploration. In 2011 IEEE 11th International Conference on Data Mining, pp. 101–110. IEEE, 2011.
  14. Facetatlas: Multifaceted visualization for rich text corpora. IEEE transactions on visualization and computer graphics, 16(6):1172–1181, 2010.
  15. Vairoma: A visual analytics system for making sense of places, times, and events in roman history. IEEE Transactions on Visualization and Computer Graphics, 22(1):210–219, 2016. doi: 10 . 1109/TVCG . 2015 . 2467971
  16. Utopian: User-driven topic modeling based on interactive nonnegative matrix factorization. IEEE Transactions on Visualization and Computer Graphics, 19(12):1992–2001, 2013. doi: 10 . 1109/TVCG . 2013 . 212
  17. Termite: Visualization techniques for assessing textual topic models. In Proceedings of the international working conference on advanced visual interfaces, pp. 74–77, 2012.
  18. Interpretation and trust: Designing model-driven visualizations for text analysis. In Proceedings of the SIGCHI conference on human factors in computing systems, pp. 443–452, 2012.
  19. S. Citraro and G. Rossetti. Eva: Attribute-aware network segmentation. In Complex Networks and Their Applications VIII: Volume 1 Proceedings of the Eighth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2019 8, pp. 141–151. Springer, 2020.
  20. I-louvain: An attributed graph clustering method. In Advances in Intelligent Data Analysis XIV: 14th International Symposium, IDA 2015, Saint Etienne. France, October 22-24, 2015. Proceedings 14, pp. 181–192. Springer, 2015.
  21. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  22. Hierarchicaltopics: Visually exploring large text collections using topic hierarchies. IEEE Transactions on Visualization and Computer Graphics, 19(12):2002–2011, 2013.
  23. Towards a survey on static and dynamic hypergraph visualizations. In 2021 IEEE visualization conference (VIS), pp. 81–85. IEEE, 2021.
  24. Chatgpt outperforms crowd-workers for text-annotation tasks. arXiv preprint arXiv:2303.15056, 2023.
  25. wto: an r package for computing weighted topological overlap and a consensus network with integrated visualization tool. BMC bioinformatics, 19(1):1–16, 2018.
  26. Hisva: A visual analytics system for studying history. IEEE Transactions on Visualization and Computer Graphics, 28(12):4344–4359, 2022. doi: 10 . 1109/TVCG . 2021 . 3086414
  27. Development of nasa-tlx (task load index): Results of empirical and theoretical research. In Advances in psychology, vol. 52, pp. 139–183. Elsevier, 1988.
  28. vispubdata.org: A metadata collection about IEEE visualization (VIS) publications. IEEE Transactions on Visualization and Computer Graphics, 23(9):2199–2206, Sept. 2017. doi: 10 . 1109/TVCG . 2016 . 2615308
  29. Unsupervised dense information retrieval with contrastive learning. Transactions on Machine Learning Research, 2022.
  30. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6769–6781. Association for Computational Linguistics, Online, Nov. 2020. doi: 10 . 18653/v1/2020 . emnlp-main . 550
  31. A new measure of modularity in hypergraphs: Theoretical insights and implications for effective clustering. In Complex Networks and Their Applications VIII: Volume 1 Proceedings of the Eighth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2019 8, pp. 286–297. Springer, 2020.
  32. ivisclustering: An interactive visual document clustering via topic modeling. In Computer graphics forum, vol. 31, pp. 1155–1164. Wiley Online Library, 2012.
  33. The human touch: How non-expert users perceive, interpret, and fix topic models. International Journal of Human-Computer Studies, 105:28–42, 2017.
  34. Evaluating chatgpt’s information extraction capabilities: An assessment of performance, explainability, calibration, and faithfulness. arXiv preprint arXiv:2304.11633, 2023.
  35. Large-scale graph visualization and analytics. Computer, 46(7):39–46, 2013. doi: 10 . 1109/MC . 2013 . 242
  36. P. Maddigan and T. Susnjak. Chat2vis: Generating data visualisations via natural language using chatgpt, codex and gpt-3 large language models. IEEE Access, 2023.
  37. C. Muelder and K.-L. Ma. Rapid graph layout using space filling curves. IEEE Transactions on Visualization and Computer Graphics, 14(6):1301–1308, 2008.
  38. Vitality: Promoting serendipitous discovery of academic literature with transformers & visual analytics. IEEE Transactions on Visualization and Computer Graphics, 28(1):486–496, 2021.
  39. vitality: Promoting serendipitous discovery of academic literature. 2022.
  40. Named entity recognition and relation extraction: State-of-the-art. ACM Computing Surveys (CSUR), 54(1):1–39, 2021.
  41. Comparative exploration of document collections: a visual analytics approach. In Computer Graphics Forum, vol. 33, pp. 201–210. Wiley Online Library, 2014.
  42. Networks of collaborations: Hypergraph modeling and visualisation. CoRR, abs/1707.00115, 2017.
  43. Conceptvector: Text visual analytics via interactive lexicon building using word embedding. IEEE Transactions on Visualization and Computer Graphics, 24(1):361–370, 2018. doi: 10 . 1109/TVCG . 2017 . 2744478
  44. A new concave hull algorithm and concaveness measure for n-dimensional datasets. Journal of Information science and engineering, 28(3):587–600, 2012.
  45. Docflow: A visual analytics system for question-based document retrieval and categorization. IEEE Transactions on Visualization and Computer Graphics, 2022.
  46. Explain and trust: An interactive machine learning framework for exploring text embeddings. IEEE Transactions on Visualization and Computer Graphics, 2023.
  47. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering, 27(2):443–460, 2014.
  48. Interactive document clustering revisited: A visual analytics approach. In 23rd International Conference on Intelligent User Interfaces, pp. 281–292, 2018.
  49. Jigsaw: Supporting investigative analysis through interactive visualization. In 2007 IEEE Symposium on Visual Analytics Science and Technology, pp. 131–138, 2007. doi: 10 . 1109/VAST . 2007 . 4389006
  50. A comparison of document clustering techniques. 2000.
  51. Sdrquerier: A visual querying framework for cross-national survey data recycling. IEEE Transactions on Visualization and Computer Graphics, 2023.
  52. Phrasemap: Attention-based keyphrases recommendation for information seeking. IEEE Transactions on Visualization and Computer Graphics, 2022.
  53. I. Vayansky and S. A. Kumar. A review of topic modeling methods. Information Systems, 94:101582, 2020.
  54. M. Vijaymeena and K. Kavitha. A survey on similarity measures in text mining. Machine Learning and Applications: An International Journal, 3(2):19–28, 2016.
  55. Data formulator: Ai-powered concept-driven visualization authoring. IEEE Transactions on Visualization and Computer Graphics, 2023.
  56. Llms as workers in human-computational algorithms? replicating crowdsourcing pipelines with llms. arXiv preprint arXiv:2307.10168, 2023.
  57. W. Xiang and B. Wang. A survey of event extraction from text. IEEE Access, 7:173111–173137, 2019.
  58. Vistopic: A visual analytics system for making sense of large document collections using hierarchical topic modeling. Visual Informatics, 1(1):40–47, 2017.
  59. Extractive summarization via chatgpt for faithful summary generation. arXiv preprint arXiv:2304.04193, 2023.
  60. Context-faithful prompting for large language models, 2023.
  61. J. Červený. https://github.com/jakubcerveny/gilbert/commits/master generalized hilbert (”gilbert”) space-filling curve for rectangular domains of arbitrary (non-power of two) sizes., 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Sam Yu-Te Lee (4 papers)
  2. Kwan-Liu Ma (79 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets