2000 character limit reached
Document Set Expansion with Positive-Unlabelled Learning Using Intractable Density Estimation (2403.17473v1)
Published 26 Mar 2024 in cs.IR
Abstract: The Document Set Expansion (DSE) task involves identifying relevant documents from large collections based on a limited set of example documents. Previous research has highlighted Positive and Unlabeled (PU) learning as a promising approach for this task. However, most PU methods rely on the unrealistic assumption of knowing the class prior for positive samples in the collection. To address this limitation, this paper introduces a novel PU learning framework that utilizes intractable density estimation models. Experiments conducted on PubMed and Covid datasets in a transductive setting showcase the effectiveness of the proposed method for DSE. Code is available from https://github.com/Beautifuldog01/Document-set-expansion-puDE.
- Improving bert-based query-by-document retrieval with multi-task optimization. In European Conference on Information Retrieval, pages 3–12. Springer.
- Jessa Bekker and Jesse Davis. 2020. Learning from positive and unlabeled data: A survey. Machine Learning, 109:719–760.
- Kernel density estimation based factored relevance model for multi-contextual point-of-interest recommendation. Information Retrieval Journal, 25(1):44–90.
- Positive unlabeled learning with class-prior approximation. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, pages 2014–2021. International Joint Conferences on Artificial Intelligence Organization. Main track.
- A variational approach for learning from positive and unlabeled data. Advances in Neural Information Processing Systems, 33:14844–14854.
- Litcovid: an open database of covid-19 literature. Nucleic acids research, 49(D1):D1534–D1540.
- Class-prior estimation for learning from positive and unlabeled data. In Asian Conference on Machine Learning, pages 221–236. PMLR.
- Charles Elkan and Keith Noto. 2008. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’08, page 213–220, New York, NY, USA. Association for Computing Machinery.
- Dynamical variational autoencoders: A comprehensive review. arXiv preprint arXiv:2008.12595.
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr.
- Scalable evaluation and improvement of document set expansion via neural positive-unlabeled learning. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 581–592, Online. Association for Computational Linguistics.
- Clef 2019 technology assisted reviews in empirical medicine overview. In CEUR workshop proceedings, volume 2380, page 250.
- Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
- Positive-unlabeled learning with non-negative risk estimator. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 1674–1684, Red Hook, NY, USA. Curran Associates Inc.
- Information retrieval from scientific abstract and citation databases: A query-by-documents approach based on monte-carlo sampling. Expert Systems with Applications, 199:116967.
- A tutorial on energy-based learning. Predicting structured data, 1(0).
- Grace E. Lee and Aixin Sun. 2018. Seed-driven document ranking for systematic reviews in evidence-based medicine. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR ’18, page 455–464, New York, NY, USA. Association for Computing Machinery.
- Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, 1, page 3. Atlanta, Georgia, USA.
- Masked autoregressive flow for density estimation. Advances in neural information processing systems, 30.
- Convex formulation for learning from positive and unlabeled data. In Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1386–1394, Lille, France. PMLR.
- Okapi at trec-3. Nist Special Publication Sp, 109:109.
- Machine learning reduced workload for the cochrane covid-19 study register: development and evaluation of the cochrane covid-19 study classifier. Systematic Reviews, 11(1):1–8.
- B.W. Silverman. 2018. Density Estimation for Statistics and Data Analysis. Routledge.
- From little things big things grow: A collection with seed studies for medical systematic review literature search. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, page 3176–3186, New York, NY, USA. Association for Computing Machinery.
- Neural rankers for effective screening prioritisation in medical systematic review literature search. In Proceedings of the 26th Australasian Document Computing Symposium, ADCS ’22, New York, NY, USA. Association for Computing Machinery.
- Seed-driven document ranking for systematic reviews: A reproducibility study. In Advances in Information Retrieval, pages 686–700, Cham. Springer International Publishing.
- Query by document. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 34–43.
- Deep structured energy based models for anomaly detection. In International conference on machine learning, pages 1100–1109. PMLR.
- Haiyang Zhang (56 papers)
- Qiuyi Chen (5 papers)
- Yuanjie Zou (2 papers)
- Yushan Pan (11 papers)
- Jia Wang (163 papers)
- Mark Stevenson (30 papers)