Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the selection of the correct number of terms for profile construction: theoretical and empirical analysis (2401.10963v1)

Published 19 Jan 2024 in cs.IR

Abstract: In this paper, we examine the problem of building a user profile from a set of documents. This profile will consist of a subset of the most representative terms in the documents that best represent user preferences or interests. Inspired by the discrete concentration theory we have conducted an axiomatic study of seven properties that a selection function should fulfill: the minimum and maximum uncertainty principle, invariant to adding zeros, invariant to scale transformations, principle of nominal increase, transfer principle and the richest get richer inequality. We also present a novel selection function based on the use of similarity metrics, and more specifically the cosine measure which is commonly used in information retrieval, and demonstrate that this verifies six of the properties in addition to a weaker variant of the transfer principle, thereby representing a good selection approach. The theoretical study was complemented with an empirical study to compare the performance of different selection criteria (weight- and unweight-based) using real data in a parliamentary setting. In this study, we analyze the performance of the different functions focusing on the two main factors affecting the selection process: profile size (number of terms) and weight distribution. These profiles are then used in a document filtering task to show that our similarity-based approach performs well in terms not only of recommendation accuracy but also efficiency (we obtain smaller profiles and consequently faster recommendations).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (8)
  1. A.B. Atkinson. On the measurement of Inequality. Journal of Economic Theory 2. 244-263, 1970.
  2. K. Balog, People search in the enterprise. PhD Thesis. University of Amsterdam. 2008.
  3. S. Lee and HJ Kim. News Keyword Extraction for Topic Tracking. On 4th Int. Conf. on Networked Computing and Advanced Information Management, pp.554-559. 2008
  4. Y. Li and N Zhong. Mining ontology for automatically acquiring web user information needs. IEEE. Transactions on Knowledge and Data Engineering 18, 4. pp. 554-568. 2006.
  5. D. Lin. Authomatic retrieval and clustering of similar words. Proceedings of the 17th Int. Conf. on Computational Linguistics. pp.768-774. 1998.
  6. Lowe W. Towards a theory of semantic space. Proceedings of the 21th Annual Conf. of the Cognitive Science Society. pp. 576-581. 2001
  7. P. Turney and P. Pantel. From frequency to meaning: Vector Space Models of semantics. Journal of Artificial Intelligence Research 37, pp. 141-188. 2010.
  8. G.K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley. 1949
Citations (2)

Summary

We haven't generated a summary for this paper yet.