Preliminary Guidelines For Combining Data Integration and Visual Data Analysis
Abstract: Data integration is often performed to consolidate information from multiple disparate data sources during visual data analysis. However, integration operations are usually separate from visual analytics operations such as encode and filter in both interface design and empirical research. We conducted a preliminary user study to investigate whether and how data integration should be incorporated directly into the visual analytics process. We used two interface alternatives featuring contrasting approaches to the data preparation and analysis workflow: manual file-based ex-situ integration as a separate step from visual analytics operations; and automatic UI-based in-situ integration merged with visual analytics operations. Participants were asked to complete specific and free-form tasks with each interface, browsing for patterns, generating insights, and summarizing relationships between attributes distributed across multiple files. Analyzing participants' interactions and feedback, we found both task completion time and total interactions to be similar across interfaces and tasks, as well as unique integration strategies between interfaces and emergent behaviors related to satisficing and cognitive bias. Participants' time spent and interactions revealed that in-situ integration enabled users to spend more time on analysis tasks compared with ex-situ integration. Participants' integration strategies and analytical behaviors revealed differences in interface usage for generating and tracking hypotheses and insights. With these results, we synthesized preliminary guidelines for designing future visual analytics interfaces that can support integrating attributes throughout an active analysis process.
- J. Thomas and K. Cook, “A visual analytics agenda,” IEEE Computer Graphics and Applications, vol. 26, no. 1, pp. 10–13, 2006.
- M. Lenzerini, “Data integration: A theoretical perspective,” in Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, ser. PODS ’02. New York, NY, USA: Association for Computing Machinery, 2002, p. 233–246.
- A. Endert, M. S. Hossain, N. Ramakrishnan, C. North, P. Fiaux, and C. Andrews, “The human is the loop: new directions for visual analytics,” Journal of Intelligent Information Systems, vol. 43, no. 3, pp. 411–435, Dec 2014.
- J. S. Yi, Y. a. Kang, J. Stasko, and J. Jacko, “Toward a deeper understanding of the role of interaction in information visualization,” IEEE Transactions on Visualization and Computer Graphics, vol. 13, no. 6, pp. 1224–1231, 2007.
- S. Kandel, A. Paepcke, J. M. Hellerstein, and J. Heer, “Enterprise data analysis and visualization: An interview study,” IEEE Transactions on Visualization and Computer Graphics, vol. 18, no. 12, pp. 2917–2926, 2012.
- P. Pirolli and S. Card, “Information foraging.” Psychological review, vol. 106, no. 4, p. 643, 1999.
- ——, “The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis,” in Proceedings of international conference on intelligence analysis, vol. 5. McLean, VA, USA, 2005, pp. 2–4.
- G. Klein, J. K. Phillips, E. L. Rall, and D. A. Peluso, “A data-frame theory of sensemaking,” in Expertise out of context: Proceedings of the sixth international conference on naturalistic decision making. New York, NY: Lawrence Erlbaum Assoc Inc, 2007, pp. 113–155.
- E. Dimara and J. Stasko, “A critical reflection on visualization research: Where do decision making tasks hide?” IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 1, pp. 1128–1138, 2022.
- K. Wongsuphasawat, D. Moritz, A. Anand, J. Mackinlay, B. Howe, and J. Heer, “Voyager: Exploratory analysis via faceted browsing of visualization recommendations,” IEEE Transactions on Visualization and Computer Graphics, vol. 22, no. 1, pp. 649–658, 2016.
- K. Morton, R. Bunker, J. Mackinlay, R. Morton, and C. Stolte, “Dynamic workload driven data integration in tableau,” in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’12. New York, NY, USA: Association for Computing Machinery, 2012, p. 807–816.
- M. Van Kleek et al., “Carpé data: Supporting serendipitous data integration in personal information management,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ser. CHI ’13. New York, NY, USA: Association for Computing Machinery, 2013, p. 2339–2348.
- J. Hendler, “Data integration for heterogenous datasets,” Big Data, vol. 2, no. 4, pp. 205–215, 2014.
- C. Bizer, “The emerging web of linked data,” IEEE Intelligent Systems, vol. 24, no. 5, pp. 87–92, 2009.
- A. Gal, “Uncertain entity resolution: Re-evaluating entity resolution in the big data era: Tutorial,” Proc. VLDB Endow., vol. 7, no. 13, p. 1711–1712, aug 2014.
- P. Konda et al., “Magellan: Toward building entity matching management systems over data science stacks,” Proc. VLDB Endow., vol. 9, no. 13, p. 1581–1584, sep 2016.
- Y. Li, J. Li, Y. Suhara, A. Doan, and W.-C. Tan, “Deep entity matching with pre-trained language models,” Proc. VLDB Endow., vol. 14, no. 1, p. 50–60, sep 2020.
- P. Höfler, M. Granitzer, E. E. Veas, and C. Seifert, “Linked data query wizard: A novel interface for accessing SPARQL endpoints,” in Proceedings of the Workshop on Linked Data on the Web co-located with the 23rd International World Wide Web Conference (WWW 2014), Seoul, Korea, April 8, 2014, ser. CEUR Workshop Proceedings, vol. 1184. CEUR-WS.org, 2014.
- A. Mohamed, G. Abuoda, A. Ghanem, Z. Kaoudi, and A. Aboulnaga, “Rdfframes: knowledge graph access for machine learning tools,” The VLDB Journal, vol. 31, no. 2, pp. 321–346, Mar 2022.
- P. A. Bonatti, S. Decker, A. Polleres, and V. Presutti, “Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web (Dagstuhl Seminar 18371),” Dagstuhl Reports, vol. 8, no. 9, pp. 29–111, 2019.
- A. Hogan et al., “Knowledge graphs,” ACM Comput. Surv., vol. 54, no. 4, jul 2021.
- Y. Lou, M. Uddin, N. Brown, and M. Cafarella, “Knowledge graph programming with a human-in-the-loop: Preliminary results,” in Proceedings of the Workshop on Human-In-the-Loop Data Analytics, ser. HILDA ’19. New York, NY, USA: Association for Computing Machinery, 2019.
- M. Dallachiesa et al., “Nadeef: A commodity data cleaning system,” in Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’13. New York, NY, USA: Association for Computing Machinery, 2013, p. 541–552.
- E. Zhu, Y. He, and S. Chaudhuri, “Auto-join: Joining tables by leveraging transformations,” Proc. VLDB Endow., vol. 10, no. 10, p. 1034–1045, jun 2017.
- S. Mudgal et al., “Deep learning for entity matching: A design space exploration,” in Proceedings of the 2018 International Conference on Management of Data, ser. SIGMOD ’18. New York, NY, USA: Association for Computing Machinery, 2018, p. 19–34.
- S. Kandel, J. Heer, C. Plaisant, J. Kennedy, F. van Ham, N. H. Riche, C. Weaver, B. Lee, D. Brodbeck, and P. Buono, “Research directions in data wrangling: Visualizations and transformations for usable and credible data,” Information Visualization, vol. 10, no. 4, pp. 271–288, 2011.
- S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer, “Wrangler: Interactive visual specification of data transformation scripts,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ser. CHI ’11. New York, NY, USA: Association for Computing Machinery, 2011, p. 3363–3372.
- M. Kahng, S. B. Navathe, J. T. Stasko, and D. H. P. Chau, “Interactive browsing and navigation in relational databases,” Proc. VLDB Endow., vol. 9, no. 12, p. 1017–1028, aug 2016.
- N. Cramer, G. Nakamura, and A. Endert, “The impact of streaming data on sensemaking with mixed-initiative visual analytics,” in Augmented Cognition. Neurocognition and Machine Learning, D. D. Schmorrow and C. M. Fidopiastis, Eds. Cham: Springer International Publishing, 2017, pp. 478–498.
- D. Cashman, S. Xu, S. Das, F. Heimerl, C. Liu, S. R. Humayoun, M. Gleicher, A. Endert, and R. Chang, “Cava: A visual analytics system for exploratory columnar data augmentation using knowledge graphs,” IEEE Transactions on Visualization and Computer Graphics, vol. 27, no. 2, pp. 1731–1741, 2021.
- S. Latif, S. Agarwal, S. Gottschalk, C. Chrosch, F. Feit, J. Jahn, T. Braun, Y. C. Tchenko, E. Demidova, and F. Beck, “Visually connecting historical figures through event knowledge graphs,” in 2021 IEEE Visualization Conference (VIS), 2021, pp. 156–160.
- G. Smith, M. Czerwinski, B. Meyers, D. Robbins, G. Robertson, and D. S. Tan, “Facetmap: A scalable search and browse visualization,” IEEE Transactions on Visualization and Computer Graphics, vol. 12, no. 5, pp. 797–804, 2006.
- J. Bernard, M. Steiger, S. Widmer, H. Lücke-Tieke, T. May, and J. Kohlhammer, “Visual-interactive exploration of interesting multivariate relations in mixed research data sets,” Computer Graphics Forum, vol. 33, no. 3, pp. 291–300, 2014.
- Y. Zheng, L. Capra, O. Wolfson, and H. Yang, “Urban computing: Concepts, methodologies, and applications,” ACM Trans. Intell. Syst. Technol., vol. 5, no. 3, sep 2014.
- J. M. Kanter and K. Veeramachaneni, “Deep feature synthesis: Towards automating data science endeavors,” in 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2015, pp. 1–10.
- M. Brehmer and T. Munzner, “A multi-level typology of abstract visualization tasks,” IEEE Transactions on Visualization and Computer Graphics, vol. 19, no. 12, pp. 2376–2385, 2013.
- I. Cho, R. Wesslen, A. Karduni, S. Santhanam, S. Shaikh, and W. Dou, “The anchoring effect in decision-making with visual analytics,” in 2017 IEEE Conference on Visual Analytics Science and Technology (VAST), 2017, pp. 116–126.
- E. L. Hutchins, J. D. Hollan, and D. A. Norman, “Direct manipulation interfaces,” Human–computer interaction, vol. 1, no. 4, pp. 311–338, 1985.
- N. Elmqvist, A. V. Moere, H.-C. Jetter, D. Cernea, H. Reiterer, and T. Jankun-Kelly, “Fluid interaction for information visualization,” Information Visualization, vol. 10, no. 4, pp. 327–340, 2011.
- C. Ahlberg, C. Williamson, and B. Shneiderman, “Dynamic queries for information exploration: An implementation and evaluation,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ser. CHI ’92. New York, NY, USA: Association for Computing Machinery, 1992, p. 619–626.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.