A Markov Random Field Topic Space Model for Document Retrieval (1111.6640v1)
Abstract: This paper proposes a novel statistical approach to intelligent document retrieval. It seeks to offer a more structured and extensible mathematical approach to the term generalization done in the popular Latent Semantic Analysis (LSA) approach to document indexing. A Markov Random Field (MRF) is presented that captures relationships between terms and documents as probabilistic dependence assumptions between random variables. From there, it uses the MRF-Gibbs equivalence to derive joint probabilities as well as local probabilities for document variables. A parameter learning method is proposed that utilizes rank reduction with singular value decomposition in a matter similar to LSA to reduce dimensionality of document-term relationships to that of a latent topic space. Experimental results confirm the ability of this approach to effectively and efficiently retrieve documents from substantial data sets.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.