Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient and Optimal Algorithms for Tree Summarization with Weighted Terminologies (2008.03053v2)

Published 7 Aug 2020 in cs.DB

Abstract: Data summarization that presents a small subset of a dataset to users has been widely applied in numerous applications and systems. Many datasets are coded with hierarchical terminologies, e.g., the international classification of Diseases-9, Medical Subject Heading, and Gene Ontology, to name a few. In this paper, we study the problem of selecting a diverse set of k elements to summarize an input dataset with hierarchical terminologies, and visualize the summary in an ontology structure. We propose an efficient greedy algorithm to solve the problem with (1-1/e) = 62% -approximation guarantee. Although this greedy solution achieves quality-guaranteed answers approximately but it is still not optimal. To tackle the problem optimally, we further develop a dynamic programming algorithm to obtain optimal answers for graph visualization of log-data using ontology terminologies called OVDO . The complexity and correctness of OVDO are theoretically analyzed. In addition, we propose a useful optimization technique of tree reduction to remove useless nodes with zero weights and shrink the tree into a smaller one, which ensures the efficiency acceleration of OVDO in many real-world applications. Extensive experimental results on real-world datasets show the effectiveness and efficiency of our proposed approximate and exact algorithms for tree data summarization.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Xuliang Zhu (6 papers)
  2. Xin Huang (222 papers)
  3. Byron Choi (9 papers)
  4. Jianliang Xu (36 papers)
  5. William K. Cheung (17 papers)
  6. Yanchun Zhang (15 papers)
  7. Jiming Liu (19 papers)
Citations (1)