Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automatic Construction of Multi-faceted User Profiles using Text Clustering and its Application to Expert Recommendation and Filtering Problems (2401.10634v1)

Published 19 Jan 2024 in cs.IR

Abstract: In the information age we are living in today, not only are we interested in accessing multimedia objects such as documents, videos, etc. but also in searching for professional experts, people or celebrities, possibly for professional needs or just for fun. Information access systems need to be able to extract and exploit various sources of information (usually in text format) about such individuals, and to represent them in a suitable way usually in the form of a profile. In this article, we tackle the problems of profile-based expert recommendation and document filtering from a machine learning perspective by clustering expert textual sources to build profiles and capture the different hidden topics in which the experts are interested. The experts will then be represented by means of multi-faceted profiles. Our experiments show that this is a valid technique to improve the performance of expert finding and document filtering.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Capturing scholar’s knowledge from heterogeneous resources for profiling in recommender systems. Expert Systems with Applications, 41(17):7945 – 7957.
  2. Multiple interests of users in collaborative tagging systems. In King, I. and Baeza-Yates, R., editors, Weaving Services and People on the World Wide Web, pages 255–274. Springer Berlin Heidelberg, Berlin, Heidelberg.
  3. Expertise retrieval. Found. Trends Inf. Retr., 6(2):127–256.
  4. Information filtering and information retrieval: Two sides of the same coin? Commun. ACM, 35(12):29–38.
  5. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993–1022.
  6. Recommender systems survey. Know.-Based Syst., 46:109–132.
  7. Improving news articles recommendations via user clustering. International Journal of Machine Learning and Cybernetics, 8(1):223–237.
  8. Combining latent dirichlet allocation and k-means for documents clustering: Effect of probabilistic based distance measures. In Nguyen, N. T., Tojo, S., Nguyen, L. M., and Trawiński, B., editors, Intelligent Information and Database Systems, pages 248–257, Cham. Springer International Publishing.
  9. Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases. ACM Trans. Database Syst., 15(4):483–517.
  10. A web document personalization user model and system. In Proceedings of the Information Retrieval and User Modelling Conference.
  11. Webmate: A personal agent for browsing and searching. In Proceedings of the Second International Conference on Autonomous Agents, AGENTS ’98, pages 132–139, New York, NY, USA. ACM.
  12. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of the 32Nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, pages 758–759, New York, NY, USA. ACM.
  13. Search Engines: Information Retrieval in Practice. Addison-Wesley Publishing Company, USA, 1st edition.
  14. A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2):224–227.
  15. Committee-based profiles for politician finding. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2(25):21–36.
  16. Profile-based recommendation: A case study in a parliamentary context. Journal of Information Science, 43(5):665–682.
  17. Personalisation in web computing and informatics: Theories, techniques, applications, and future research. Information Systems Frontiers, 12(5):607–629.
  18. User profiles for personalized information access. In Brusilovsky, P., Kobsa, A., and Nejdl, W., editors, The Adaptive Web: Methods and Strategies of Web Personalization, pages 54–89. Springer Berlin Heidelberg, Berlin, Heidelberg.
  19. Personalised information retrieval: survey and classification. User Modeling and User-Adapted Interaction, 23(4):381–443.
  20. Implicit user profiling in news recommender systems. In Proceedings of the 10th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST,, pages 185–192. INSTICC, ScitePress.
  21. Information filtering: Overview of issues, research and systems. User Modeling and User-Adapted Interaction, 11(3):203–259.
  22. A multi-faceted user model for twitter. In Proceedings of the 20th International Conference on User Modeling, Adaptation, and Personalization, UMAP’12, pages 303–309, Berlin, Heidelberg. Springer-Verlag.
  23. Hierarchical document clustering: A review. IJCA Proceedings on 2nd National Conference on Information and Communication Technology, NCICT(3):37–41.
  24. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7(5):217 – 240.
  25. Cumulated gain-based evaluation of ir techniques. ACM Transaction on Information System, 20(4):422–446.
  26. Hierarchical divisive clustering with multi view-point based similarity measure. In Satapathy, S. C., Udgata, S. K., and Biswal, B. N., editors, Proceedings of the International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2013, pages 483–491, Cham. Springer International Publishing.
  27. Combining deep learning and topic modeling for review understanding in context-aware recommendation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1605–1614. Association for Computational Linguistics.
  28. A probabilistic model of information retrieval: Development and comparative experiments. Inf. Process. Manage., 36(6):779–808.
  29. Cluster analysis by self-organizing maps: An application to the modelling of water quality in a treatment process. Applied Soft Computing, 13(7):3191–3196.
  30. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley.
  31. Kohonen, T. (2001). Self Organizing Maps. Springer series in information sciences, 30. Springer, 3rd edition.
  32. Kook, H. J. (2005). Profiling multiple domains of user interests and using them for personalized web support. In Huang, D.-S., Zhang, X.-P., and Huang, G.-B., editors, Advances in Intelligent Computing: International Conference on Intelligent Computing, ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part II, pages 512–520. Springer Berlin Heidelberg, Berlin, Heidelberg.
  33. A neural autoregressive topic model. In Pereira, F., Burges, C. J. C., Bottou, L., and Weinberger, K. Q., editors, Advances in Neural Information Processing Systems 25, pages 2708–2716. Curran Associates, Inc.
  34. A survey on expert finding techniques. Journal of Intelligent Information Systems, 49(2):255–279.
  35. Content-based recommender systems: State of the art and trends. In Ricci, F., Rokach, L., Shapira, B., and Kantor, P. B., editors, Recommender Systems Handbook, pages 73–105. Springer US, Boston, MA.
  36. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pages 281–297, Berkeley, Calif. University of California Press.
  37. Who do you want to be today? web personae for personalised information access. In De Bra, P., Brusilovsky, P., and Conejo, R., editors, Adaptive Hypermedia and Adaptive Web-Based Systems: Second International Conference, AH 2002 Málaga, Spain, May 29–31, 2002 Proceedings, pages 514–517. Springer Berlin Heidelberg, Berlin, Heidelberg.
  38. Discovering discrete latent topics with neural variational inference. In ICML, volume 70 of Proceedings of the 34th Machine Learning Conference, pages 2410–2419. PMLR.
  39. Exploiting big data for enhanced representations in content-based recommender systems. In Huemer, C. and Lops, P., editors, E-Commerce and Web Technologies: 14th International Conference, EC-Web 2013, Prague, Czech Republic, August 27-28, 2013. Proceedings, pages 182–193. Springer Berlin Heidelberg, Berlin, Heidelberg.
  40. Modification to k-medoids and clara for effective document clustering. In Kryszkiewicz, M., Appice, A., Slkezak, D., Rybinski, H., Skowron, A., and Raś, Z. W., editors, Foundations of Intelligent Systems, pages 481–491, Cham. Springer International Publishing.
  41. P. Lloyd, S. (1982). Least squares quantization in pcm’s. IEEE Transactions on Information Theory, 28:129–136.
  42. On the use of self-organizing map for text clustering in engineering change process analysis: a case study. Computational Intelligence and Neuroscience, 2016:Article n.7.
  43. Self-organizing map and clustering algorithms for the analysis of occupational accident databases. Safety Science, 48:1215–1230.
  44. Semantic-based expert search in textbook research archives. In Risse, T., Predoiu, L., Nürnberger, A., and Ross, S., editors, Proceedings of the 5th International Workshop on Semantic Digital Archives co-located with 19th International Conference on Theory and Practice of Digital Libraries (TPDL 2015), Poznan, Poland, September 18, 2015., volume 1529 of CEUR Workshop Proceedings, pages 18–29. CEUR-WS.org.
  45. Syskill & webert: Identifying interesting web sites. In Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 1, AAAI’96, pages 54–61. AAAI Press.
  46. Tracking multiple topics for finding interesting articles. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’07, pages 560–569, New York, NY, USA. ACM.
  47. Document clustering using k-means and k-medoids. International Journal of knowledge-based Computer Systems, 1(1):7–13.
  48. Rijsbergen, C. J. V. (1979). Information Retrieval. Butterworth-Heinemann, Newton, MA, USA, 2nd edition.
  49. Clustering methods. In Maimon, O. and Rokach, L., editors, Data Mining and Knowledge Discovery Handbook, pages 321–352. Springer US, Boston, MA.
  50. Roux, M. (2018). A comparative study of divisive and agglomerative hierarchical clustering algorithms. Journal of Classification, 35(2):345–366.
  51. Temporal expertise profiling. In de Rijke, M., Kenter, T., de Vries, A. P., Zhai, C., de Jong, F., Radinsky, K., and Hofmann, K., editors, Advances in Information Retrieval: 36th European Conference on IR Research, ECIR 2014, Amsterdam, The Netherlands, April 13-16, 2014. Proceedings, pages 540–546. Springer International Publishing, Cham.
  52. Comparison of hierarchical agglomerative algorithms for clustering medical documents. International Journal of Software Engineering and Applications, 3:1–15.
  53. Intelligent user profiling. In Bramer, M., editor, Artificial Intelligence, pages 193–216. Springer-Verlag, Berlin, Heidelberg.
  54. Document clustering: A detailed review. International Journal of Applied Information Systems, 4(5):30–38.
  55. Incremental clustering for profile maintenance in information gathering web agents. In Proceedings of the Fifth International Conference on Autonomous Agents, AGENTS ’01, pages 262–269, New York, NY, USA. ACM.
  56. Performance evaluation of the silhouette index. In Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L. A., and Zurada, J. M., editors, Artificial Intelligence and Soft Computing, pages 49–58, Cham. Springer International Publishing.
  57. A roadmap to integrate document clustering in information retrieval. Int. J. Inf. Retr. Res., 1(1):31–44.
  58. Alipes: A swift messenger in cyberspace. In Proceedings of AAAI Spring Symposium on Intelligent Agents in Cyberspace, pages 62–67.
  59. A robust k-means for document clustering. Journal of the Institute of Industrial Applications Engineers, 6:60–65.
  60. Zamora, J. (2017). Recent Advances in High-Dimensional Clustering for Text Data, pages 323–337. Springer International Publishing, Cham.
  61. A unified framework for clustering heterogeneous web objects. In Proceedings of the 3rd International Conference on Web Information Systems Engineering, WISE ’02, pages 161–172, Washington, DC, USA. IEEE Computer Society.
  62. Hierarchical clustering algorithms for document datasets. Data Min. Knowl. Discov., 10(2):141–168.
Citations (20)

Summary

We haven't generated a summary for this paper yet.