Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

InsightLens: Augmenting LLM-Powered Data Analysis with Interactive Insight Management and Navigation (2404.01644v2)

Published 2 Apr 2024 in cs.HC

Abstract: The proliferation of LLMs has revolutionized the capabilities of natural language interfaces (NLIs) for data analysis. LLMs can perform multi-step and complex reasoning to generate data insights based on users' analytic intents. However, these insights often entangle with an abundance of contexts in analytic conversations such as code, visualizations, and natural language explanations. This hinders efficient recording, organization, and navigation of insights within the current chat-based LLM interfaces. In this paper, we first conduct a formative study with eight data analysts to understand their general workflow and pain points of insight management during LLM-powered data analysis. Accordingly, we introduce InsightLens, an interactive system to overcome such challenges. Built upon an LLM-agent-based framework that automates insight recording and organization along with the analysis process, InsightLens visualizes the complex conversational contexts from multiple aspects to facilitate insight navigation. A user study with twelve data analysts demonstrates the effectiveness of InsightLens, showing that it significantly reduces users' manual and cognitive effort without disrupting their conversational data analysis workflow, leading to a more efficient analysis experience.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (81)
  1. Gpt-4 technical report. arXiv, 2023. doi: 10 . 48550/ARXIV . 2303 . 08774
  2. A comparative survey of recent natural language interfaces for databases. VLDB J., 28:793–819, 2019. doi: 10 . 1007/S00778-019-00567-8
  3. Spellburst: A node-based interface for exploratory creative coding with natural language prompts. In Proc. UIST. ACM, New York, NY, USA, 2023. doi: 10 . 1145/3586183 . 3606719
  4. Explaining queries over web tables to non-experts. In ICDE, pp. 1570–1573, 2019. doi: 10 . 1109/ICDE . 2019 . 00144
  5. Understanding how in-visualization provenance can support trade-off analysis. IEEE Trans. Vis. Comput. Graph., 29(9):3758–3774, 2023. doi: 10 . 1109/TVCG . 2022 . 3171074
  6. Extending context window of large language models via positional interpolation. arXiv, 2023. doi: 10 . 48550/ARXIV . 2306 . 15595
  7. Toward effective insight management in visual analytics systems. In PacificVis, pp. 49–56, 2009. doi: 10 . 1109/PACIFICVIS . 2009 . 4906837
  8. Z. Chen and H. Xia. Crossdata: Leveraging text-data connections for authoring data documents. In Proc. CHI. ACM, New York, NY, USA, 2022. doi: 10 . 1145/3491102 . 3517485
  9. Binding language models in symbolic languages. In ICLR, 2023. doi: 10 . 48550/ARXIV . 2210 . 02875
  10. Can large language models be an alternative to human evaluations? In Proc. ACL, pp. 15607–15631. ACL, Toronto, Canada, July 2023. doi: 10 . 18653/v1/2023 . acl-long . 870
  11. Conversational challenges in ai-powered data science: Obstacles, needs, and design opportunities. arXiv, 2023. doi: 10 . 48550/ARXIV . 2310 . 16164
  12. A multi-modal natural language interface to an information visualization environment. International Journal of Speech Technology, 4:297–314, 2001. doi: 10 . 1023/A%3A1011368926479
  13. Quickinsights: Quick and automatic discovery of insights from multi-dimensional data. In Proc. SIGMOD, p. 317–332. ACM, New York, NY, USA, 2019. doi: 10 . 1145/3299869 . 3314037
  14. Asknow: A framework for natural language query formalization in sparql. In Proc. International Conference on The Semantic Web, p. 300–316. Springer, Berlin, Heidelberg, 2016. doi: 10 . 1007/978-3-319-34129-3_19
  15. Dead or alive: Continuous data profiling for interactive data science. IEEE Trans. Vis. Comput. Graph., 30(1):197–207, 2024. doi: 10 . 1109/TVCG . 2023 . 3327367
  16. Xnli: Explaining and diagnosing nli-based visual data analysis. IEEE Trans. Vis. Comput. Graph., pp. 1–14, 2023. doi: 10 . 1109/TVCG . 2023 . 3240003
  17. Promptmagician: Interactive prompt engineering for text-to-image creation. IEEE Trans. Vis. Comput. Graph., 30(1):295–305, 2024. doi: 10 . 1109/TVCG . 2023 . 3327168
  18. Datatone: Managing ambiguity in natural language interfaces for data visualization. In Proc. UIST, p. 489–500. ACM, New York, NY, USA, 2015. doi: 10 . 1145/2807442 . 2807478
  19. D. Gotz and M. X. Zhou. Characterizing users’ visual analytic activity for insight provenance. In IEEE VAST, pp. 123–130, 2008. doi: 10 . 1109/VAST . 2008 . 4677365
  20. How do data analysts respond to ai assistance? a wizard-of-oz study. arXiv, 2023. doi: 10 . 48550/ARXIV . 2309 . 10108
  21. How do analysts understand and verify ai-assisted data analyses? arXiv, 2023. doi: 10 . 48550/ARXIV . 2309 . 10947
  22. Towards complex text-to-SQL in cross-domain database with intermediate representation. In Proc. ACL, pp. 4524–4535. ACL, Florence, Italy, July 2019. doi: 10 . 18653/v1/P19-1444
  23. M. Hearst and M. Tory. Would you like a chart with that? incorporating visualizations into conversational interfaces. In IEEE VIS, pp. 1–5, 2019. doi: 10 . 1109/VISUAL . 2019 . 8933766
  24. M.-H. Hong and A. Crisan. Conversational ai threads for visualizing multidimensional datasets. arXiv, 2023. doi: 10 . 48550/ARXIV . 2311 . 05590
  25. Applying pragmatics principles for interaction with visual analytics. IEEE Trans. Vis. Comput. Graph., 24(1):309–318, 2018. doi: 10 . 1109/TVCG . 2017 . 2744684
  26. The hallmark effect: Supporting provenance and transparent use of large language models in writing with interactive visualization. arXiv, 2024. doi: 10 . 48550/ARXIV . 2311 . 13057
  27. Memory sandbox: Transparent and interactive memory management for conversational agents. In Proc. UIST. ACM, New York, NY, USA, 2023. doi: 10 . 1145/3586182 . 3615796
  28. Neural approaches for natural language interfaces to databases: A survey. In Proc. COLING, pp. 381–395. International Committee on Computational Linguistics, Barcelona, Spain (Online), Dec. 2020. doi: 10 . 18653/v1/2020 . coling-main . 34
  29. Graphologue: Exploring large language model responses with interactive diagrams. In Proc. UIST. ACM, New York, NY, USA, 2023. doi: 10 . 1145/3586183 . 3606737
  30. A. Kamath and R. Das. A survey on semantic parsing. arXiv, 2019. doi: 10 . 48550/ARXIV . 1812 . 00978
  31. Towards effective foraging by data scientists to find past analysis choices. In Proc. CHI, p. 1–13. ACM, New York, NY, USA, 2019. doi: 10 . 1145/3290605 . 3300322
  32. Prediction of users’ learning curves for adaptation while using an information visualization. In Proc. IUI, p. 357–368. ACM, New York, NY, USA, 2015. doi: 10 . 1145/2678025 . 2701376
  33. Exploring the "double-edged sword" effect of auto-insight recommendation in exploratory data analysis. In Proc. IUI Workshop, CEUR Workshop Proceedings, 2021.
  34. C5: Towards better conversation comprehension and contextual continuity for chatgpt. arXiv, 2023. doi: 10 . 48550/ARXIV . 2309 . 10108
  35. Inksight: Leveraging sketch interaction for documenting chart findings in computational notebooks. IEEE Trans. Vis. Comput. Graph., 30(1):944–954, 2024. doi: 10 . 1109/TVCG . 2023 . 3327170
  36. JarviX: A LLM no code platform for tabular data analysis and optimization. In Proc. EMNLP, pp. 622–630. ACL, Singapore, Dec. 2023. doi: 10 . 18653/v1/2023 . emnlp-industry . 59
  37. Sprout: Authoring programming tutorials with interactive visualization of large language model generation process. arXiv, 2023. doi: 10 . 48550/ARXIV . 2312 . 01801
  38. Agentlens: Visual analysis for agent behaviors in llm-based autonomous systems. arXiv, 2024. doi: 10 . 48550/ARXIV . 2402 . 08995
  39. Details-first, show context, overview last: Supporting exploration of viscous fingers in large-scale ensemble simulations. IEEE Trans. Vis. Comput. Graph., 25(1):1225–1235, 2019. doi: 10 . 1109/TVCG . 2018 . 2864849
  40. InsightPilot: An LLM-empowered automated data exploration system. In Proc. EMNLP, pp. 346–352. ACL, Singapore, Dec. 2023. doi: 10 . 18653/v1/2023 . emnlp-demo . 31
  41. Analytic provenance in practice: The role of provenance in real-world visualization and data analysis environments. IEEE Computer Graphics and Applications, 39(6):30–45, 2019. doi: 10 . 1109/MCG . 2019 . 2933419
  42. On the design of ai-powered code assistants for notebooks. In Proc. CHI. ACM, New York, NY, USA, 2023. doi: 10 . 1145/3544548 . 3580940
  43. Lumos: Increasing awareness of analytic behavior during visual data analysis. IEEE Trans. Vis. Comput. Graph., 28(1):1009–1018, 2022. doi: 10 . 1109/TVCG . 2021 . 3114827
  44. Diy: Assessing the correctness of natural language to sql systems. In Proc. IUI, p. 597–607. ACM, New York, NY, USA, 2021. doi: 10 . 1145/3397481 . 3450667
  45. Nl4dv: A toolkit for generating analytic specifications for data visualization from natural language queries. IEEE Trans. Vis. Comput. Graph., 27(2):369–379, 2021. doi: 10 . 1109/TVCG . 2020 . 3030378
  46. Sensepath: Understanding the sensemaking process through analytic provenance. IEEE Trans. Vis. Comput. Graph., 22(1):41–50, 2016. doi: 10 . 1109/TVCG . 2015 . 2467611
  47. OpenAI. Chatgpt plugins. https://openai.com/blog/chatgpt-plugins#code-interpreter, 2024.
  48. Towards efficient visual simplification of computational graphs in deep neural networks. IEEE Trans. Vis. Comput. Graph., pp. 1–14, 2022. doi: 10 . 1109/TVCG . 2022 . 3230832
  49. Datamations: Animated explanations of data analysis pipelines. In Proc. CHI. ACM, New York, NY, USA, 2021. doi: 10 . 1145/3411764 . 3445063
  50. Characterizing provenance in visualization and data analysis: An organizational framework of provenance types and purposes. IEEE Trans. Vis. Comput. Graph., 22(1):31–40, 2016. doi: 10 . 1109/TVCG . 2015 . 2467551
  51. N. Reimers and I. Gurevych. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proc. EMNLP-IJCNLP, pp. 3982–3992. ACL, Hong Kong, China, Nov. 2019. doi: 10 . 18653/v1/D19-1410
  52. Athena: an ontology-driven system for natural language querying over relational data stores. Proc. VLDB Endow., 9(12):1209–1220, aug 2016. doi: 10 . 14778/2994509 . 2994536
  53. Eviza: A natural language interface for visual analysis. In Proc. UIST, p. 365–377. ACM, New York, NY, USA, 2016. doi: 10 . 1145/2984511 . 2984588
  54. V. Setlur and M. Tory. How do you converse with an analytical chatbot? revisiting gricean maxims for designing analytical conversational behavior. In Proc. CHI. ACM, New York, NY, USA, 2022. doi: 10 . 1145/3491102 . 3501972
  55. Inferencing underspecified natural language utterances in visual analysis. In Proc. IUI, p. 40–51. ACM, New York, NY, USA, 2019. doi: 10 . 1145/3301275 . 3302270
  56. Towards natural language interfaces for data visualization: A survey. IEEE Trans. Vis. Comput. Graph., 29(6):3121–3144, 2023. doi: 10 . 1109/TVCG . 2022 . 3148007
  57. Augmenting visualizations with interactive data facts to facilitate interpretation and communication. IEEE Trans. Vis. Comput. Graph., 25(1):672–681, 2019. doi: 10 . 1109/TVCG . 2018 . 2865145
  58. Collecting and characterizing natural language utterances for specifying data visualizations. In Proc. CHI. ACM, New York, NY, USA, 2021. doi: 10 . 1145/3411764 . 3445400
  59. A. Srinivasan and V. Setlur. Snowy: Recommending utterances for conversational visual analysis. In Proc. UIST, p. 864–880. ACM, New York, NY, USA, 2021. doi: 10 . 1145/3472749 . 3474792
  60. A. Srinivasan and J. Stasko. Orko: Facilitating multimodal interaction for visual exploration and analysis of networks. IEEE Trans. Vis. Comput. Graph., 24(1):511–521, 2018. doi: 10 . 1109/TVCG . 2017 . 2745219
  61. Bridging the gulf of envisioning: Cognitive design challenges in llm interfaces. arXiv, 2023. doi: 10 . 48550/ARXIV . 2309 . 14459
  62. Structured generation and exploration of design space with large language models for human-ai co-creation. arXiv, 2023. doi: 10 . 48550/ARXIV . 2310 . 12953
  63. Sensecape: Enabling multilevel exploration and sensemaking with large language models. In Proc. UIST. ACM, New York, NY, USA, 2023. doi: 10 . 1145/3586183 . 3606756
  64. M. Tory and V. Setlur. Do what i mean, not what i say! design considerations for supporting intent and context in analytical conversation. In IEEE VAST, pp. 93–103, 2019. doi: 10 . 1109/VAST47406 . 2019 . 8986918
  65. Llama: Open and efficient foundation language models. arXiv, 2023. doi: 10 . 48550/ARXIV . 2302 . 13971
  66. K. Urgo and J. Arguello. Learning assessments in search-as-learning: A survey of prior work and opportunities for future research. Information Processing & Management, 59(2):102821, 2022. doi: 10 . 1016/j . ipm . 2021 . 102821
  67. RAT-SQL: Relation-aware schema encoding and linking for text-to-SQL parsers. In Proc. ACL, pp. 7567–7578. ACL, Online, July 2020. doi: 10 . 18653/v1/2020 . acl-main . 677
  68. Interactive data analysis with next-step natural language query recommendation. arXiv, 2022. doi: 10 . 48550/ARXIV . 2201 . 04868
  69. Datashot: Automatic generation of fact sheets from tabular data. IEEE Trans. Vis. Comput. Graph., 26(1):895–905, 2020. doi: 10 . 1109/TVCG . 2019 . 2934398
  70. Structure-aware fisheye views for efficient large graph exploration. IEEE Trans. Vis. Comput. Graph., 25(1):566–575, 2019. doi: 10 . 1109/TVCG . 2018 . 2864911
  71. StickyLand: Breaking the Linear Presentation of Computational Notebooks. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems. ACM, 2022. doi: 10 . 1145/3491101 . 3519653
  72. UnifiedSKG: Unifying and multi-tasking structured knowledge grounding with text-to-text language models. In Proc. EMNLP, pp. 602–631. ACL, Abu Dhabi, United Arab Emirates, Dec. 2022. doi: 10 . 18653/v1/2022 . emnlp-main . 39
  73. Openagents: An open platform for language agents in the wild. arXiv, 2023. doi: 10 . 48550/ARXIV . 2310 . 10634
  74. React: Synergizing reasoning and acting in language models. In ICLR, 2023. doi: 10 . 48550/ARXIV . 2210 . 03629
  75. Wordcraft: Story writing with large language models. In Proc. IUI, p. 841–852. ACM, New York, NY, USA, 2022. doi: 10 . 1145/3490099 . 3511105
  76. Data-copilot: Bridging billions of data and humans with autonomous workflow. arXiv, 2023. doi: 10 . 48550/ARXIV . 2306 . 07209
  77. Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv, 2023. doi: 10 . 48550/ARXIV . 2309 . 01219
  78. Natural language question/answering: Let users talk with the knowledge graph. In Proc. CIKM, p. 217–226. ACM, New York, NY, USA, 2017. doi: 10 . 1145/3132847 . 3132977
  79. TaCube: Pre-computing data cubes for answering numerical-reasoning questions over tabular data. In Proc. EMNLP, pp. 2278–2291. ACL, Abu Dhabi, United Arab Emirates, Dec. 2022. doi: 10 . 18653/v1/2022 . emnlp-main . 145
  80. Modeling and leveraging analytic focus during exploratory visual analysis. In Proc. CHI. ACM, New York, NY, USA, 2021. doi: 10 . 1145/3411764 . 3445674
  81. Foresight: Rapid data exploration through guideposts. arXiv, 2017. doi: 10 . 48550/ARXIV . 1709 . 10513
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Luoxuan Weng (6 papers)
  2. Xingbo Wang (33 papers)
  3. Junyu Lu (31 papers)
  4. Yingchaojie Feng (11 papers)
  5. Yihan Liu (24 papers)
  6. Wei Chen (1288 papers)
  7. Haozhe Feng (7 papers)
  8. Danqing Huang (11 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets