Overview of the TREC 2023 NeuCLIR Track (2404.08071v1)
Abstract: The principal goal of the TREC Neural Cross-Language Information Retrieval (NeuCLIR) track is to study the impact of neural approaches to cross-language information retrieval. The track has created four collections, large collections of Chinese, Persian, and Russian newswire and a smaller collection of Chinese scientific abstracts. The principal tasks are ranked retrieval of news in one of the three languages, using English topics. Results for a multilingual task, also with English topics but with documents from all three newswire collections, are also reported. New in this second year of the track is a pilot technical documents CLIR task for ranked retrieval of Chinese technical documents using English topics. A total of 220 runs across all tasks were submitted by six participating teams and, as baselines, by track coordinators. Task descriptions and results are presented.
- A System for Efficient High-Recall Retrieval. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM, 1317–1320.
- MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. arXiv:1611.09268 [cs.CL]
- mMARCO: A Multilingual Version of the MS MARCO Passage Ranking Dataset. arXiv:2108.13897 [cs.CL]
- UMass at TREC 2023 NeuCLIR Track. In Proceedings of The Sixteenth Text REtrieval Conference Proceedings (TREC 2023).
- Naverloo @ TREC Deep Learning and NeuCLIR 2023: As easy as zero, one, two, three — Cascading dual encoders, mono, duo, and listo for ad-hoc retrieval. In Proceedings of The Sixteenth Text REtrieval Conference Proceedings (TREC 2023).
- Overview of the TREC 2022 NeuCLIR Track. In Proceedings of The Thirty-First Text REtrieval Conference.
- HC4: A New Suite of Test Collections for Ad Hoc CLIR. In Proceedings of the 44th European Conference on Information Retrieval (ECIR).
- Multilingual ColBERT-X. arXiv preprint arXiv:2209.01335 (2022).
- CSL: A Large-scale Chinese Scientific Literature Dataset. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 3917–3923. https://aclanthology.org/2022.coling-1.344
- ISI’s SEARCHER II System for TREC’s 2023 NeuCLIR Track. In Proceedings of The Sixteenth Text REtrieval Conference Proceedings (TREC 2023).
- Suraj Nair and Douglas W. Oard. 2023. BLADE: The University of Maryland at the TREC 2023 NeuCLIR Track. In Proceedings of The Sixteenth Text REtrieval Conference Proceedings (TREC 2023).
- Transfer Learning Approaches for Building Cross-Language Dense Retrieval Models. In Proceedings of the 44th European Conference on Information Retrieval (ECIR).
- Learning Translational and Knowledge-based Similarities from Relevance Rankings for Cross-Language Retrieval. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. https://www.cl.uni-heidelberg.de/~riezler/publications/papers/ACL2014short.pdf
- Ashudeep Singh and Thorsten Joachims. 2018. Fairness of exposure in rankings. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 2219–2228.
- Luca Soldaini. 2023. AI2 at TREC 2023 NeuCLIR Track. In Proceedings of The Sixteenth Text REtrieval Conference Proceedings (TREC 2023).
- Shuo Sun and Kevin Duh. 2020. CLIRMatrix: A massively large collection of bilingual and multilingual datasets for Cross-Lingual Information Retrieval. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 4160–4170. https://doi.org/10.18653/v1/2020.emnlp-main.340
- HLTCOE at TREC 2023 NeuCLIR Track. In Proceedings of The Sixteenth Text REtrieval Conference Proceedings (TREC 2023).
- Dawn Lawrie (31 papers)
- Sean MacAvaney (75 papers)
- James Mayfield (21 papers)
- Paul McNamee (10 papers)
- Douglas W. Oard (18 papers)
- Luca Soldaini (62 papers)
- Eugene Yang (38 papers)