Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Preliminary Guidelines For Combining Data Integration and Visual Data Analysis (2403.04757v1)

Published 7 Mar 2024 in cs.HC

Abstract: Data integration is often performed to consolidate information from multiple disparate data sources during visual data analysis. However, integration operations are usually separate from visual analytics operations such as encode and filter in both interface design and empirical research. We conducted a preliminary user study to investigate whether and how data integration should be incorporated directly into the visual analytics process. We used two interface alternatives featuring contrasting approaches to the data preparation and analysis workflow: manual file-based ex-situ integration as a separate step from visual analytics operations; and automatic UI-based in-situ integration merged with visual analytics operations. Participants were asked to complete specific and free-form tasks with each interface, browsing for patterns, generating insights, and summarizing relationships between attributes distributed across multiple files. Analyzing participants' interactions and feedback, we found both task completion time and total interactions to be similar across interfaces and tasks, as well as unique integration strategies between interfaces and emergent behaviors related to satisficing and cognitive bias. Participants' time spent and interactions revealed that in-situ integration enabled users to spend more time on analysis tasks compared with ex-situ integration. Participants' integration strategies and analytical behaviors revealed differences in interface usage for generating and tracking hypotheses and insights. With these results, we synthesized preliminary guidelines for designing future visual analytics interfaces that can support integrating attributes throughout an active analysis process.

Analyzing Data Integration within Visual Analytics Processes

The paper "Preliminary Guidelines For Combining Data Integration and Visual Data Analysis" presents a pragmatic investigation into the integration of data from multiple discrete sources within the framework of visual analytics. It addresses the common separation of data integration and visual analytics in both interface design and empirical research, positing the potential benefits of a more integrated approach. This paper is anchored on a user paper comparing two contrasting interface designs concerning data integration: manual, file-based ex-situ integration, and automatic, user-interface-based in-situ integration.

The research is structured around two pivotal questions: (1) how data integration operations can be supported in tandem with visual analytics operations and (2) how this incorporation influences user behaviors. The paper employed interfaces modeled after Polestar, facilitating both ex-situ and in-situ integration strategies. The findings reveal intriguing insights into how participants alternatively engage with data integration either as a preliminary step or interwoven with the analysis—indicating a significant shift in time and engagement strategies depending on the integration method.

One notable outcome of this paper is the consistency in task completion time across both interfaces, despite the theoretical advantage of reduced operational steps with in-situ integration. Participants in the paper often exhibited varied strategies: some preferred to integrate all relevant data beforehand, while others favored an on-the-fly integration approach. This suggests that the choice of integration approach can significantly influence user interaction patterns.

Further, the paper highlights that in-situ integration does enable users to spend more time directly on analysis, as opposed to data preparation, which is typical in ex-situ strategies. However, the paper also suggests potential downsides to in-situ integration, such as the introduction of cognitive bias and satisficing behaviors, which can detract from comprehensive analysis.

The implications of these findings are multifaceted. Practically, they underscore the need for designers of visual analytics tools to consider how data preparation can be seamlessly integrated into the visual analytics process to maximize efficiency without sacrificing analytical depth. Theoretically, it pushes the domain towards understanding the cognitive ramifications of different data integration methodologies within visual analytics, emphasizing the need for interfaces that balance automated and manual processes to maintain analytical rigor while enhancing usability and exploration efficiency.

Looking forward, this paper lays the groundwork for developing visual analytics interfaces that incorporate dynamic data integration strategies, ultimately supporting a more flexible, powerful approach to managing and analyzing complex datasets. It propels the dialogue on how visual analytics can evolve to better resonate with human cognitive processes and improve analytical comprehensiveness in ever-complex data ecosystems. Future enhancements may involve exploring more complex integration tasks, examining varying user experience levels, and considering heterogeneous and incomplete datasets, further enriching both practical and theoretical understandings of data integration in visual analytics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. J. Thomas and K. Cook, “A visual analytics agenda,” IEEE Computer Graphics and Applications, vol. 26, no. 1, pp. 10–13, 2006.
  2. M. Lenzerini, “Data integration: A theoretical perspective,” in Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, ser. PODS ’02.   New York, NY, USA: Association for Computing Machinery, 2002, p. 233–246.
  3. A. Endert, M. S. Hossain, N. Ramakrishnan, C. North, P. Fiaux, and C. Andrews, “The human is the loop: new directions for visual analytics,” Journal of Intelligent Information Systems, vol. 43, no. 3, pp. 411–435, Dec 2014.
  4. J. S. Yi, Y. a. Kang, J. Stasko, and J. Jacko, “Toward a deeper understanding of the role of interaction in information visualization,” IEEE Transactions on Visualization and Computer Graphics, vol. 13, no. 6, pp. 1224–1231, 2007.
  5. S. Kandel, A. Paepcke, J. M. Hellerstein, and J. Heer, “Enterprise data analysis and visualization: An interview study,” IEEE Transactions on Visualization and Computer Graphics, vol. 18, no. 12, pp. 2917–2926, 2012.
  6. P. Pirolli and S. Card, “Information foraging.” Psychological review, vol. 106, no. 4, p. 643, 1999.
  7. ——, “The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis,” in Proceedings of international conference on intelligence analysis, vol. 5.   McLean, VA, USA, 2005, pp. 2–4.
  8. G. Klein, J. K. Phillips, E. L. Rall, and D. A. Peluso, “A data-frame theory of sensemaking,” in Expertise out of context: Proceedings of the sixth international conference on naturalistic decision making.   New York, NY: Lawrence Erlbaum Assoc Inc, 2007, pp. 113–155.
  9. E. Dimara and J. Stasko, “A critical reflection on visualization research: Where do decision making tasks hide?” IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 1, pp. 1128–1138, 2022.
  10. K. Wongsuphasawat, D. Moritz, A. Anand, J. Mackinlay, B. Howe, and J. Heer, “Voyager: Exploratory analysis via faceted browsing of visualization recommendations,” IEEE Transactions on Visualization and Computer Graphics, vol. 22, no. 1, pp. 649–658, 2016.
  11. K. Morton, R. Bunker, J. Mackinlay, R. Morton, and C. Stolte, “Dynamic workload driven data integration in tableau,” in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’12.   New York, NY, USA: Association for Computing Machinery, 2012, p. 807–816.
  12. M. Van Kleek et al., “Carpé data: Supporting serendipitous data integration in personal information management,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ser. CHI ’13.   New York, NY, USA: Association for Computing Machinery, 2013, p. 2339–2348.
  13. J. Hendler, “Data integration for heterogenous datasets,” Big Data, vol. 2, no. 4, pp. 205–215, 2014.
  14. C. Bizer, “The emerging web of linked data,” IEEE Intelligent Systems, vol. 24, no. 5, pp. 87–92, 2009.
  15. A. Gal, “Uncertain entity resolution: Re-evaluating entity resolution in the big data era: Tutorial,” Proc. VLDB Endow., vol. 7, no. 13, p. 1711–1712, aug 2014.
  16. P. Konda et al., “Magellan: Toward building entity matching management systems over data science stacks,” Proc. VLDB Endow., vol. 9, no. 13, p. 1581–1584, sep 2016.
  17. Y. Li, J. Li, Y. Suhara, A. Doan, and W.-C. Tan, “Deep entity matching with pre-trained language models,” Proc. VLDB Endow., vol. 14, no. 1, p. 50–60, sep 2020.
  18. P. Höfler, M. Granitzer, E. E. Veas, and C. Seifert, “Linked data query wizard: A novel interface for accessing SPARQL endpoints,” in Proceedings of the Workshop on Linked Data on the Web co-located with the 23rd International World Wide Web Conference (WWW 2014), Seoul, Korea, April 8, 2014, ser. CEUR Workshop Proceedings, vol. 1184.   CEUR-WS.org, 2014.
  19. A. Mohamed, G. Abuoda, A. Ghanem, Z. Kaoudi, and A. Aboulnaga, “Rdfframes: knowledge graph access for machine learning tools,” The VLDB Journal, vol. 31, no. 2, pp. 321–346, Mar 2022.
  20. P. A. Bonatti, S. Decker, A. Polleres, and V. Presutti, “Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web (Dagstuhl Seminar 18371),” Dagstuhl Reports, vol. 8, no. 9, pp. 29–111, 2019.
  21. A. Hogan et al., “Knowledge graphs,” ACM Comput. Surv., vol. 54, no. 4, jul 2021.
  22. Y. Lou, M. Uddin, N. Brown, and M. Cafarella, “Knowledge graph programming with a human-in-the-loop: Preliminary results,” in Proceedings of the Workshop on Human-In-the-Loop Data Analytics, ser. HILDA ’19.   New York, NY, USA: Association for Computing Machinery, 2019.
  23. M. Dallachiesa et al., “Nadeef: A commodity data cleaning system,” in Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’13.   New York, NY, USA: Association for Computing Machinery, 2013, p. 541–552.
  24. E. Zhu, Y. He, and S. Chaudhuri, “Auto-join: Joining tables by leveraging transformations,” Proc. VLDB Endow., vol. 10, no. 10, p. 1034–1045, jun 2017.
  25. S. Mudgal et al., “Deep learning for entity matching: A design space exploration,” in Proceedings of the 2018 International Conference on Management of Data, ser. SIGMOD ’18.   New York, NY, USA: Association for Computing Machinery, 2018, p. 19–34.
  26. S. Kandel, J. Heer, C. Plaisant, J. Kennedy, F. van Ham, N. H. Riche, C. Weaver, B. Lee, D. Brodbeck, and P. Buono, “Research directions in data wrangling: Visualizations and transformations for usable and credible data,” Information Visualization, vol. 10, no. 4, pp. 271–288, 2011.
  27. S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer, “Wrangler: Interactive visual specification of data transformation scripts,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ser. CHI ’11.   New York, NY, USA: Association for Computing Machinery, 2011, p. 3363–3372.
  28. M. Kahng, S. B. Navathe, J. T. Stasko, and D. H. P. Chau, “Interactive browsing and navigation in relational databases,” Proc. VLDB Endow., vol. 9, no. 12, p. 1017–1028, aug 2016.
  29. N. Cramer, G. Nakamura, and A. Endert, “The impact of streaming data on sensemaking with mixed-initiative visual analytics,” in Augmented Cognition. Neurocognition and Machine Learning, D. D. Schmorrow and C. M. Fidopiastis, Eds.   Cham: Springer International Publishing, 2017, pp. 478–498.
  30. D. Cashman, S. Xu, S. Das, F. Heimerl, C. Liu, S. R. Humayoun, M. Gleicher, A. Endert, and R. Chang, “Cava: A visual analytics system for exploratory columnar data augmentation using knowledge graphs,” IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 2, pp. 1731–1741, 2021.
  31. S. Latif, S. Agarwal, S. Gottschalk, C. Chrosch, F. Feit, J. Jahn, T. Braun, Y. C. Tchenko, E. Demidova, and F. Beck, “Visually connecting historical figures through event knowledge graphs,” in 2021 IEEE Visualization Conference (VIS), 2021, pp. 156–160.
  32. G. Smith, M. Czerwinski, B. Meyers, D. Robbins, G. Robertson, and D. S. Tan, “Facetmap: A scalable search and browse visualization,” IEEE Transactions on Visualization and Computer Graphics, vol. 12, no. 5, pp. 797–804, 2006.
  33. J. Bernard, M. Steiger, S. Widmer, H. Lücke-Tieke, T. May, and J. Kohlhammer, “Visual-interactive exploration of interesting multivariate relations in mixed research data sets,” Computer Graphics Forum, vol. 33, no. 3, pp. 291–300, 2014.
  34. Y. Zheng, L. Capra, O. Wolfson, and H. Yang, “Urban computing: Concepts, methodologies, and applications,” ACM Trans. Intell. Syst. Technol., vol. 5, no. 3, sep 2014.
  35. J. M. Kanter and K. Veeramachaneni, “Deep feature synthesis: Towards automating data science endeavors,” in 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2015, pp. 1–10.
  36. M. Brehmer and T. Munzner, “A multi-level typology of abstract visualization tasks,” IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 12, pp. 2376–2385, 2013.
  37. I. Cho, R. Wesslen, A. Karduni, S. Santhanam, S. Shaikh, and W. Dou, “The anchoring effect in decision-making with visual analytics,” in 2017 IEEE Conference on Visual Analytics Science and Technology (VAST), 2017, pp. 116–126.
  38. E. L. Hutchins, J. D. Hollan, and D. A. Norman, “Direct manipulation interfaces,” Human–computer interaction, vol. 1, no. 4, pp. 311–338, 1985.
  39. N. Elmqvist, A. V. Moere, H.-C. Jetter, D. Cernea, H. Reiterer, and T. Jankun-Kelly, “Fluid interaction for information visualization,” Information Visualization, vol. 10, no. 4, pp. 327–340, 2011.
  40. C. Ahlberg, C. Williamson, and B. Shneiderman, “Dynamic queries for information exploration: An implementation and evaluation,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ser. CHI ’92.   New York, NY, USA: Association for Computing Machinery, 1992, p. 619–626.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Adam Coscia (8 papers)
  2. Ashley Suh (18 papers)
  3. Remco Chang (31 papers)
  4. Alex Endert (40 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com