Best of Both Worlds: A Pliable and Generalizable Neuro-Symbolic Approach for Relation Classification (2403.03305v1)
Abstract: This paper introduces a novel neuro-symbolic architecture for relation classification (RC) that combines rule-based methods with contemporary deep learning techniques. This approach capitalizes on the strengths of both paradigms: the adaptability of rule-based systems and the generalization power of neural networks. Our architecture consists of two components: a declarative rule-based model for transparent classification and a neural component to enhance rule generalizability through semantic text matching. Notably, our semantic matcher is trained in an unsupervised domain-agnostic way, solely with synthetic data. Further, these components are loosely coupled, allowing for rule modifications without retraining the semantic matcher. In our evaluation, we focused on two few-shot relation classification datasets: Few-Shot TACRED and a Few-Shot version of NYT29. We show that our proposed method outperforms previous state-of-the-art models in three out of four settings, despite not seeing any human-annotated training data. Further, we show that our approach remains modular and pliable, i.e., the corresponding rules can be locally modified to improve the overall model. Human interventions to the rules for the TACRED relation \texttt{org:parents} boost the performance on that relation by as much as 26\% relative improvement, without negatively impacting the other relations, and without retraining the semantic matching component.
- Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: A survey on explainable artificial intelligence (xai). IEEE Access, 6:52138–52160.
- Towards realistic few-shot relation extraction: A new meta dataset and evaluation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation.
- Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473.
- Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In NIPS.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Understanding the origins of bias in word embeddings. In ICML.
- Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning. International Conference on Computational Linguistics, Gyeongju, Republic of Korea.
- Explainability for natural language processing. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, page 4033–4034, New York, NY, USA. Association for Computing Machinery.
- BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805.
- Fewrel 2.0: Towards more challenging few-shot relation classification. In EMNLP/IJCNLP.
- Bryce Goodman and Seth Flaxman. 2016. European union regulations on algorithmic decision-making and a "right to explanation". AI Mag., 38:50–57.
- Attention guided graph convolutional networks for relation extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 241–251, Florence, Italy. Association for Computational Linguistics.
- Improved pattern learning for bootstrapped entity extraction. In CoNLL.
- Umbc_ebiquity-core: Semantic textual similarity systems. In *SEMEVAL.
- Marti A. Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics.
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation, 9:1735–1780.
- Adversarial examples are not bugs, they are features. ArXiv, abs/1905.02175.
- Alon Jacovi and Yoav Goldberg. 2020. Towards faithfully interpretable nlp systems: How should we define and evaluate faithfulness? ArXiv, abs/2004.03685.
- Sarthak Jain and Byron C. Wallace. 2019. Attention is not explanation. CoRR, abs/1902.10186.
- SpanBERT: Improving pre-training by representing and predicting spans. Transactions of the Association for Computational Linguistics, 8:64–77.
- Attention is not only a weight: Analyzing transformers with vector norms. In EMNLP.
- Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692.
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786.
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 4765–4774. Curran Associates, Inc.
- Christopher D. Manning. 2015. Computational linguistics and deep learning. Computational Linguistics, 41:701–707.
- The stanford corenlp natural language processing toolkit. In Annual Meeting of the Association for Computational Linguistics.
- Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In ACL.
- A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54:1 – 35.
- Universal Dependencies v2: An evergrowing multilingual treebank collection. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4034–4043, Marseille, France. European Language Resources Association.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
- "why should i trust you?": Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, page 1135–1144, New York, NY, USA. Association for Computing Machinery.
- Modeling relations and their mentions without labeled text. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 148–163. Springer.
- Ellen Riloff. 1993. Automatically constructing a dictionary for information extraction tasks. In AAAI.
- Ellen Riloff. 1996. Automatically generating extraction patterns from untagged text. In AAAI/IAAI, Vol. 2.
- Learning dictionaries for information extraction by multi-level bootstrapping. In AAAI/IAAI.
- Ellen Riloff and Janyce Wiebe. 2003. Learning extraction patterns for subjective expressions. In EMNLP.
- Revisiting few-shot relation classification: Evaluation data and classification schemes. Transactions of the Association for Computational Linguistics, 9:691–706.
- Timo Schick and Hinrich Schütze. 2020. Exploiting cloze questions for few shot text classification and natural language inference. arXiv preprint arXiv:2001.07676.
- Hidden technical debt in machine learning systems. In NIPS.
- Lloyd S. Shapley. 1988. A value for n-person games.
- Axiomatic attribution for deep networks. ArXiv, abs/1703.01365.
- Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, page 3104–3112, Cambridge, MA, USA. MIT Press.
- Intriguing properties of neural networks. CoRR, abs/1312.6199.
- A hierarchical framework for relation extraction with reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 7072–7079.
- Zheng Tang and Mihai Surdeanu. 2023. It takes two flints to make a fire: Multitask learning of neural relation and explanation classifiers. Computational Linguistics. Accepted on 2022.
- Erico Tjoa and Cuntai Guan. 2019. A survey on explainable artificial intelligence (xai): Toward medical xai. IEEE Transactions on Neural Networks and Learning Systems, 32:4793–4813.
- Patternrank: Jointly ranking patterns and extractions for relation extraction using graph-based algorithms. In Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning, pages 1–10.
- From examples to rules: Neural guided rule synthesis for information extraction.
- Odinson: A fast rule-based information extraction framework. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2183–2191, Marseille, France. European Language Resources Association.
- Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6000–6010, Red Hook, NY, USA. Curran Associates Inc.
- Ethical and social risks of harm from language models. ArXiv, abs/2112.04359.
- Show, attend and tell: Neural image caption generation with visual attention. In ICML.
- Luke: Deep contextualized entity representations with entity-aware self-attention. In Conference on Empirical Methods in Natural Language Processing.
- Hierarchical attention networks for document classification. In North American Chapter of the Association for Computational Linguistics.
- Position-aware attention and supervised data improve slot filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017), pages 35–45.
- Zexuan Zhong and Danqi Chen. 2020. A frustratingly easy approach for entity and relation extraction. ArXiv, abs/2010.12812.
- Wenxuan Zhou and Muhao Chen. 2021. An improved baseline for sentence-level relation extraction. arXiv preprint arXiv:2102.01373.
- Nero: A neural rule grounding framework for label-efficient relation extraction. The Web Conference ’20 in arXiv: Computation and Language.