Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
9 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
40 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Personalization of Dataset Retrieval Results using a Metadata-based Data Valuation Method (2407.15546v1)

Published 22 Jul 2024 in cs.IR and cs.DB

Abstract: In this paper, we propose a novel data valuation method for a Dataset Retrieval (DR) use case in Ireland's National mapping agency. To the best of our knowledge, data valuation has not yet been applied to Dataset Retrieval. By leveraging metadata and a user's preferences, we estimate the personal value of each dataset to facilitate dataset retrieval and filtering. We then validated the data value-based ranking against the stakeholders' ranking of the datasets. The proposed data valuation method and use case demonstrated that data valuation is promising for dataset retrieval. For instance, the outperforming dataset retrieval based on our approach obtained 0.8207 in terms of NDCG@5 (the truncated Normalized Discounted Cumulative Gain at 5). This study is unique in its exploration of a data valuation-based approach to dataset retrieval and stands out because, unlike most existing methods, our approach is validated using the stakeholders ranking of the datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Identifying nontransitive preferences. (Jan. 2023). https://doi.org/10.5167/UZH-219280 Publisher: [object Object].
  2. Carlos Alós-Ferrer and Michele Garagnani. 2021. Choice consistency and strength of preference. Economics Letters 198 (Jan. 2021), 109672. https://doi.org/10.1016/j.econlet.2020.109672
  3. Judie Attard and Rob Brennan. 2018. Challenges in Value-Driven Data Governance. In On the Move to Meaningful Internet Systems. OTM 2018 Conferences, Hervé Panetto, Christophe Debruyne, Henderik A. Proper, Claudio Agostino Ardagna, Dumitru Roman, and Robert Meersman (Eds.). Vol. 11230. Springer International Publishing, Cham, 546–554. https://doi.org/10.1007/978-3-030-02671-4_33 Series Title: Lecture Notes in Computer Science.
  4. Scientific Approaches and Methodology to Determine the Value of Data as an Asset and Use Case in the Automotive Industry. http://hdl.handle.net/10125/80023
  5. Ying Chen. 2005. Information valuation for information lifecycle management. In Second International Conference on Autonomic Computing (ICAC’05). IEEE, 135–146.
  6. Value evaluation of enterprise information based on grid-fuzzy borda number analytical method. In 2011 International Conference on E-Business and E-Government (ICEE). 1–5. https://doi.org/10.1109/ICEBEG.2011.5881899
  7. Electricity Data Valuation Considering Attribute Weights. In 2023 3rd International Conference on Intelligent Power and Systems (ICIPS). 788–794. https://doi.org/10.1109/ICIPS59254.2023.10404587
  8. Adir Even and Ganesan Shankaranarayanan. 2005. Value-Driven Data Quality Assessment.. In ICIQ.
  9. Junpeng Fang. 2016. Information Value Evaluation Index System. In PROCEEDINGS OF THE 2016 2ND WORKSHOP ON ADVANCED RESEARCH AND TECHNOLOGY IN INDUSTRY APPLICATIONS, Z. Zeng and X. Bai (Eds.), Vol. 81. Atlantis Press, Paris, 1045–1048. https://www.webofscience.com/wos/woscc/summary/fceb06eb-e340-45d9-be08-1961b34d32d1-cf48e49b/relevance/1 ISSN: 2352-5401 Num Pages: 4 Series Title: AER-Advances in Engineering Research Web of Science ID: WOS:000388364600221.
  10. Peter C. Fishburn. 1991. Nontransitive Preferences in Decision Theory. Journal of Risk and Uncertainty 4, 2 (1991), 113–134. https://www.jstor.org/stable/41760621 Publisher: Springer.
  11. Sidney Gendin. 1996. Why Preference is Not Transitive. The Philosophical Quarterly (1950-) 46, 185 (1996), 482–488. https://doi.org/10.2307/2956357 Publisher: [Oxford University Press, University of St. Andrews, Scots Philosophical Association].
  12. Enhancing battlefield situational awareness through fuzzy-based value of information. In 2013 46th Hawaii International Conference on System Sciences. IEEE, 1402–1411.
  13. Bernd Heinrich and Mathias Klier. 2011. Assessing data currency—a probabilistic approach. Journal of Information Science 37, 1 (2011), 86–100. Publisher: Sage Publications Sage UK: London, England.
  14. Igor Khokhlov and Leon Reznik. 2020. What is the value of data value in practical security applications. In 2020 IEEE Systems Security Symposium (SSS). IEEE, 1–8.
  15. Sven R. Kunze and Sören Auer. 2013. Dataset Retrieval. In 2013 IEEE Seventh International Conference on Semantic Computing. 1–8. https://doi.org/10.1109/ICSC.2013.12
  16. Douglas B. Laney. 2017. Infonomics: how to monetize, manage, and measure information as an asset for competitive advantage. Routledge.
  17. Xiao Ma and Xu Zhang. 2019. MDV: A Multi-Factors Data Valuation Method. In 2019 5th International Conference on Big Data Computing and Communications (BIGCOM). 48–53. https://doi.org/10.1109/BIGCOM.2019.00016
  18. G. O. Odu. 2019. Weighting methods for multi-criteria decision making technique. Journal of Applied Sciences and Environmental Management 23, 8 (2019), 1449–1457. https://www.ajol.info/index.php/jasem/article/view/189641
  19. An Evaluation Method of Data Valuation Based on Analytic Hierarchy Process. In 2017 14th International Symposium on Pervasive Systems, Algorithms and Networks & 2017 11th International Conference on Frontier of Computer Science and Technology & 2017 Third International Symposium of Creative Computing (ISPAN-FCST-ISCC). 524–528. https://doi.org/10.1109/ISPAN-FCST-ISCC.2017.21 ISSN: 2375-527X.
  20. R. W. Saaty. 1987. The analytic hierarchy process—what it is and how it is used. Mathematical Modelling 9, 3 (Jan. 1987), 161–176. https://doi.org/10.1016/0270-0255(87)90473-8
  21. A method for file valuation in information lifecycle management. (2007).
  22. Data valuation for decision-making with uncertainty in energy transactions: A case of the two-settlement market system. Applied Energy 288 (April 2021), 116643. https://doi.org/10.1016/j.apenergy.2021.116643
  23. A principled approach to data valuation for federated learning. Federated Learning: Privacy and Incentive (2020), 153–167. Publisher: Springer.
  24. A Theoretical Analysis of NDCG Type Ranking Measures. Journal of Machine Learning Research 30 (April 2013).
  25. Value-based file retention: File attributes as file value and information waste indicators. Journal of Data and Information Quality (JDIQ) 4, 4 (2014), 1–17. Publisher: ACM New York, NY, USA.

Summary

We haven't generated a summary for this paper yet.