UltraWiki: Ultra-fine-grained Entity Set Expansion with Negative Seed Entities (2403.04247v2)
Abstract: Entity Set Expansion (ESE) aims to identify new entities belonging to the same semantic class as a given set of seed entities. Traditional methods primarily relied on positive seed entities to represent a target semantic class, which poses challenge for the representation of ultra-fine-grained semantic classes. Ultra-fine-grained semantic classes are defined based on fine-grained semantic classes with more specific attribute constraints. Describing it with positive seed entities alone cause two issues: (i) Ambiguity among ultra-fine-grained semantic classes. (ii) Inability to define "unwanted" semantic. Due to these inherent shortcomings, previous methods struggle to address the ultra-fine-grained ESE (Ultra-ESE). To solve this issue, we first introduce negative seed entities in the inputs, which belong to the same fine-grained semantic class as the positive seed entities but differ in certain attributes. Negative seed entities eliminate the semantic ambiguity by contrast between positive and negative attributes. Meanwhile, it provide a straightforward way to express "unwanted". To assess model performance in Ultra-ESE, we constructed UltraWiki, the first large-scale dataset tailored for Ultra-ESE. UltraWiki encompasses 236 ultra-fine-grained semantic classes, where each query of them is represented with 3-5 positive and negative seed entities. A retrieval-based framework RetExpan and a generation-based framework GenExpan are proposed to comprehensively assess the efficacy of LLMs from two different paradigms in Ultra-ESE. Moreover, we devised three strategies to enhance models' comprehension of ultra-fine-grained entities semantics: contrastive learning, retrieval augmentation, and chain-of-thought reasoning. Extensive experiments confirm the effectiveness of our proposed strategies and also reveal that there remains a large space for improvement in Ultra-ESE.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning. PMLR, 1597–1607.
- Minimising semantic drift with mutual exclusion bootstrapping. In Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics, Vol. 6. Citeseer, 172–180.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Joseph L Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological bulletin 76, 5 (1971), 378.
- Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821 (2021).
- Guiding Corpus-Based Set Expansion by Auxiliary Sets Generation and Co-Expansion. In Proceedings of The Web Conference 2020 (Taipei, Taiwan) (WWW ’20). Association for Computing Machinery, New York, NY, USA, 2188–2198. https://doi.org/10.1145/3366423.3380284
- Entity recommendation for everyday digital tasks. ACM Transactions on Computer-Human Interaction (TOCHI) 28, 5 (2021), 1–41.
- Towards Mitigating LLM Hallucination via Self Reflection. In Findings of the Association for Computational Linguistics: EMNLP 2023. 1827–1843.
- Prateek Jindal and Dan Roth. 2011. Learning from negative examples in set-expansion. In 2011 IEEE 11th International Conference on Data Mining. IEEE, 1110–1115.
- Interactive construction of user-centric dictionary for text analytics. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 789–799.
- A two-stage masked LM method for term set expansion. arXiv preprint arXiv:2005.01063 (2020).
- Vision, Deduction and Alignment: An Empirical Study on Multi-Modal Knowledge Graph Alignment. In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023. IEEE, 1–5. https://doi.org/10.1109/ICASSP49357.2023.10094863
- Embracing ambiguity: Improving similarity-oriented tasks with contextual synonym knowledge. Neurocomputing 555 (2023), 126583. https://doi.org/10.1016/J.NEUCOM.2023.126583
- On the (In)Effectiveness of Large Language Models for Chinese Text Correction. CoRR abs/2307.09007 (2023). https://doi.org/10.48550/ARXIV.2307.09007 arXiv:2307.09007
- Automatic Context Pattern Generation for Entity Set Expansion. IEEE Trans. Knowl. Data Eng. 35, 12 (2023), 12458–12469. https://doi.org/10.1109/TKDE.2023.3275211
- Bidirectional End-to-End Learning of Retriever-Reader Paradigm for Entity Linking. CoRR abs/2306.12245 (2023). https://doi.org/10.48550/ARXIV.2306.12245 arXiv:2306.12245
- Active relation discovery: Towards general and label-aware open relation extraction. Knowl. Based Syst. 282 (2023), 111094. https://doi.org/10.1016/J.KNOSYS.2023.111094
- Contrastive learning with hard negative entities for entity set expansion. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1077–1086.
- MESED: A multi-modal entity set expansion dataset with fine-grained semantic classes and hard negative entities. arXiv preprint arXiv:2307.14878 (2023).
- Learning from the Dictionary: Heterogeneous Knowledge Guided Fine-tuning for Chinese Spell Checking. In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 238–249. https://doi.org/10.18653/V1/2022.FINDINGS-EMNLP.18
- The Past Mistake is the Future Wisdom: Error-driven Contrastive Probability Optimization for Chinese Spell Checking. In Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 3202–3213. https://doi.org/10.18653/V1/2022.FINDINGS-ACL.252
- Are we ready for a new paradigm shift? A survey on visual deep MLP. Patterns 3, 7 (2022), 100520. https://doi.org/10.1016/J.PATTER.2022.100520
- Linguistic Rules-Based Corpus Generation for Native Chinese Grammatical Error Correction. In Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 576–589. https://doi.org/10.18653/V1/2022.FINDINGS-EMNLP.40
- Term set expansion based nlp architect by intel ai lab. arXiv preprint arXiv:1808.08953 (2018).
- Tara McIntosh. 2010. Unsupervised discovery of negative categories in lexicon bootstrapping. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. 356–365.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).
- Towards robust linguistic analysis using ontonotes. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning. 143–152.
- Egoset: Exploiting word ego-networks and user-generated ontology for multifaceted set expansion. In Proceedings of the Ninth ACM international conference on Web search and data mining. 645–654.
- Erik F Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. arXiv preprint cs/0306050 (2003).
- Setexpan: Corpus-based set expansion via context feature selection and rank ensemble. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2017, Skopje, Macedonia, September 18–22, 2017, Proceedings, Part I 10. Springer, 288–304.
- A probabilistic co-bootstrapping method for entity set expansion. (2014).
- Sequence to sequence learning with neural networks. Advances in neural information processing systems 27 (2014).
- Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818–2826.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction. Computational Linguistics 39, 3 (2013), 665–707. https://doi.org/10.1162/COLI_a_00146
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
- Neural news recommendation with negative feedback. CCF Transactions on Pervasive Computing and Interaction 2 (2020), 178–188.
- Taxonomy-Guided Fine-Grained Entity Set Expansion. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM). SIAM, 631–639.
- Deep feedback network for recommendation. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 2519–2525.
- CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, 6174–6189. https://aclanthology.org/2023.emnlp-main.378
- Corpus-based set expansion with lexical features and distributed representations. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1153–1156.
- Knowledge-augmented Few-shot Visual Relation Detection. CoRR abs/2303.05342 (2023). https://doi.org/10.48550/ARXIV.2303.05342 arXiv:2303.05342
- Contextual Similarity is More Valuable Than Character Similarity: An Empirical Study for Chinese Spell Checking. In IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP 2023, Rhodes Island, Greece, June 4-10, 2023. IEEE, 1–5. https://doi.org/10.1109/ICASSP49357.2023.10095675
- Siren’s song in the AI ocean: a survey on hallucination in large language models. arXiv preprint arXiv:2309.01219 (2023).
- Empower Entity Set Expansion via Language Model Probing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 8151–8160.
- Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1040–1048.
- Towards Visual Taxonomy Expansion. In Proceedings of the 31st ACM International Conference on Multimedia. 6481–6490.
- Yangning Li (49 papers)
- Qingsong Lv (10 papers)
- Tianyu Yu (20 papers)
- Yinghui Li (65 papers)
- Shulin Huang (12 papers)
- Tingwei Lu (5 papers)
- Xuming Hu (120 papers)
- Hai-Tao Zheng (94 papers)
- Hui Wang (371 papers)
- Wenhao Jiang (40 papers)