MLPs Compass: What is learned when MLPs are combined with PLMs? (2401.01667v1)
Abstract: While Transformer-based pre-trained LLMs and their variants exhibit strong semantic representation capabilities, the question of comprehending the information gain derived from the additional components of PLMs remains an open question in this field. Motivated by recent efforts that prove Multilayer-Perceptrons (MLPs) modules achieving robust structural capture capabilities, even outperforming Graph Neural Networks (GNNs), this paper aims to quantify whether simple MLPs can further enhance the already potent ability of PLMs to capture linguistic information. Specifically, we design a simple yet effective probing framework containing MLPs components based on BERT structure and conduct extensive experiments encompassing 10 probing tasks spanning three distinct linguistic levels. The experimental results demonstrate that MLPs can indeed enhance the comprehension of linguistic structure by PLMs. Our research provides interpretable and valuable insights into crafting variations of PLMs utilizing MLPs for tasks that emphasize diverse linguistic structures.
- “Probing pretrained language models for lexical semantics,” in Proc. of EMNLP, 2020.
- “Why do pretrained language models help in downstream tasks? an analysis of head and prompt tuning,” in Proc. of NeurIPS, 2021.
- “Does Chinese BERT encode word structure?,” in Proc. of COLING, 2020.
- “A Primer in BERTology: What We Know About How BERT Works,” Transactions of the Association for Computational Linguistics, 2021.
- “What does BERT learn about the structure of language?,” in Proc. of ACL, 2019.
- “Does BERT rediscover a classical NLP pipeline?,” in Proc. of COLING, 2022.
- “BERT rediscovers the classical NLP pipeline,” in Proc. of ACL, 2019.
- “A weighted gcn with logical adjacency matrix for relation extraction,” in ECAI 2020. 2020.
- “Geo-encoder: A chunk-argument bi-encoder framework for chinese geographic re-ranking,” arXiv:2309.01606, 2023.
- “Pay more attention to relation exploration for knowledge base question answering,” in Proc. of ACL Findings, 2023.
- “Revisiting graph meaning representations through decoupling contextual representation learning and structural information propagation,” arXiv preprint arXiv:2310.09772, 2023.
- “Document-level relation extraction with structure enhanced transformer encoder,” in Proc. of IJCNN, 2022.
- “Enhancing document-level event argument extraction with contextual clues and role relevance,” arXiv preprint arXiv:2310.05991, 2023.
- “A simple graph neural network via layer sniffer,” in Proc. of ICASSP, 2022.
- “Dpgnn: Dual-perception graph neural network for representation learning,” Knowledge-Based Systems, 2023.
- “Rethinking random walk in graph representation learning,” in Proc. of ICASSP, 2023.
- “Bag-of-words vs. graph vs. sequence in text classification: Questioning the necessity of text-graphs and the surprising strength of a wide MLP,” in Proc. of ACL, 2022.
- “Re-tacred: Addressing shortcomings of the tacred dataset,” in Proc. of AAAI, 2021.
- “SemEval-2010 task 8: Multi-way classification of semantic relations between pairs of nominals,” in Proc. of SemEval, 2010.
- “How well apply simple MLP to incomplete utterance rewriting?,” in Proc. of ACL, 2023.
- “Rethinking the effectiveness of graph classification datasets in benchmarks for assessing GNNs,” in Submitted to The Twelfth ICLR, 2023.
- “Revisiting the transferability of supervised pretraining: an mlp perspective,” in Proc. of CVPR, 2022.
- “Attention is all you need,” Proc. of NeurIPS, 2017.
- “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. of NAACL, 2019.
- “Attention can reflect syntactic structure (if you let it),” in Proc. of EACL, 2021.
- “Linguistic knowledge and transferability of contextual representations,” in Proc. of NAACL, 2019.
- “Using roark-hollingshead distance to probe BERT’s syntactic competence,” in Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 2022.
- “What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties,” in Proc. of ACL, 2018.
- “SentEval: An evaluation toolkit for universal sentence representations,” in Proc. of LREC, 2018.