Cognacy Queries over Dependence Graphs for Transparent Visualisations (2403.04403v3)
Abstract: Charts, figures, and text derived from data play an important role in decision making, from data-driven policy development to day-to-day choices informed by online articles. Making sense of, or fact-checking, outputs means understanding how they relate to the underlying data. Even for domain experts with access to the source code and data sets, this poses a significant challenge. In this paper we introduce a new program analysis framework which supports interactive exploration of fine-grained I/O relationships directly through computed outputs, making use of dynamic dependence graphs. Our main contribution is a novel notion in data provenance which we call related inputs, a relation of mutual relevance or "cognacy" which arises between inputs when they contribute to common features of the output. Queries of this form allow readers to ask questions like "What outputs use this data element, and what other data elements are used along with it?". We show how Jonsson and Tarski's concept of conjugate operators on Boolean algebras appropriately characterises the notion of cognacy in a dependence graph, and give a procedure for computing related inputs over such a graph.
- Adaptive functional programming. In POPL ’02: Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Portland, Oregon). ACM Press, New York, NY, USA, 247–259. https://doi.org/10.1145/503272.503296
- Adaptive Functional Programming. ACM Trans. Program. Lang. Syst. 28, 6 (nov 2006), 990–1034. https://doi.org/10.1145/1186632.1186634
- Hiralal Agrawal and Joseph R. Horgan. 1990. Dynamic program slicing. In PLDI ’90: Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation (White Plains, New York, United States). ACM, New York, NY, USA, 246–256. https://doi.org/10.1145/93542.93576
- Nadieh Bremer and Marlieke Ranzijn. 2015. Urbanization in East Asia between 2000 and 2010. http://nbremer.github.io/urbanization/.
- Provenance as dependency analysis. Mathematical Structures in Computer Science 21, 6 (2011), 1301–1337.
- R. DerSimonian and N. Laird. 1986. Meta-analysis in clinical trials. Control Clin Trials 7, 3 (Sep 1986), 177–88. https://doi.org/10.1016/0197-2456(86)90046-2
- The Program Dependence Graph and Its Use in Optimization. ACM Trans. Program. Lang. Syst. 9, 3 (jul 1987), 319–349. https://doi.org/10.1145/24039.24041
- John Field and Frank Tip. 1998. Dynamic Dependence in Term Rewriting Systems and its Application to Program Slicing. Information and Software Technology 40, 11–12 (November/December 1998), 609–636.
- Generalized Selection via Interactive Query Relaxation. In ACM Human Factors in Computing Systems (CHI). 959–968.
- Ralf Hinze. 2000. Generalizing generalized tries. Journal of Functional Programming 10, 4 (2000), 327–351. https://doi.org/10.1017/S0956796800003713
- Bjarni Jonsson and Alfred Tarski. 1951. Boolean Algebras with Operators. Part I. American Journal of Mathematics 73, 4 (1951), 891–939. http://www.jstor.org/stable/2372123
- DEVise: integrated querying and visual exploration of large datasets. SIGMOD Rec. 26, 2 (jun 1997), 301–312. https://doi.org/10.1145/253262.253335
- Quantification of modelling uncertainties in a large ensemble of climate change simulations. Nature 430 (09 2004), 768–72. https://doi.org/10.1038/nature02771
- Chris North and Ben Shneiderman. 2000. Snap-together visualization: a user interface for coordinating visualizations via relational schemata. In Proceedings of the Working Conference on Advanced Visual Interfaces (Palermo, Italy) (AVI ’00). Association for Computing Machinery, New York, NY, USA, 128–135. https://doi.org/10.1145/345513.345282
- Functional Programs That Explain Their Work. In Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming (Copenhagen, Denmark) (ICFP ’12). ACM, New York, NY, USA, 365–376. https://doi.org/10.1145/2364527.2364579
- Causally Consistent Dynamic Slicing. In Concurrency Theory, 27th International Conference, CONCUR ’16 (Leibniz International Proceedings in Informatics (LIPIcs)), Josée Desharnais and Radha Jagadeesan (Eds.). Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany. https://doi.org/10.4230/LIPIcs.CONCUR.2016.18
- Linked Visualisations via Galois Dependencies. Proc. ACM Program. Lang. 6, POPL, Article 7 (2022), 29 pages. https://doi.org/10.1145/3498668
- Triemaps that match. Technical Report. https://simon.peytonjones.org/triemaps-that-match/
- Fotis Psallidas and Eugene Wu. 2018. Provenance for Interactive Visualizations. In Workshop on Human-In-the-Loop Data Analytics (HILDA 2018). ACM.
- Imperative Functional Programs That Explain Their Work. Proceedings of the ACM on Programming Languages 1, ICFP, Article 14 (2017), 28 pages. https://doi.org/10.1145/3110258
- Vega-Lite: A Grammar of Interactive Graphics. IEEE Trans. Visualization & Comp. Graphics (Proc. InfoVis) (2017).
- Access Path Selection in a Relational Database Management System. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data (Boston, Massachusetts) (SIGMOD ’79). Association for Computing Machinery, New York, NY, USA, 23–34. https://doi.org/10.1145/582095.582099
- Locating faults with program slicing: an empirical analysis. Empirical Software Engineering 26, 3 (01 Apr 2021), 51. https://doi.org/10.1007/s10664-020-09931-7
- K. Truemper. 1989. On the delta-wye reduction for planar graphs. Journal of Graph Theory 13, 2 (1989), 141–148. https://doi.org/10.1002/jgt.3190130202