Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Picking the Underused Heads: A Network Pruning Perspective of Attention Head Selection for Fusing Dialogue Coreference Information (2312.09541v1)

Published 15 Dec 2023 in cs.CL

Abstract: The Transformer-based models with the multi-head self-attention mechanism are widely used in natural language processing, and provide state-of-the-art results. While the pre-trained language backbones are shown to implicitly capture certain linguistic knowledge, explicitly incorporating structure-aware features can bring about further improvement on the downstream tasks. However, such enhancement often requires additional neural components and increases training parameter size. In this work, we investigate the attention head selection and manipulation strategy for feature injection from a network pruning perspective, and conduct a case study on dialogue summarization. We first rank attention heads in a Transformer-based summarizer with layer-wise importance. We then select the underused heads through extensive analysis, and inject structure-aware features by manipulating the selected heads. Experimental results show that the importance-based head selection is effective for feature injection, and dialogue summarization can be improved by incorporating coreference information via head manipulation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. “Quoref: A reading comprehension dataset with questions requiring coreferential reasoning,” in Proceedings of the EMNLP-IJCNLP 2019, Hong Kong, China, Nov. 2019, pp. 5925–5932, Association for Computational Linguistics.
  2. “Attention is all you need,” in Proceedings of the NeurIPS, 2017.
  3. “What do you learn from context? probing for sentence structure in contextualized word representations,” in Proceedings of the ICLR 2019, 2019.
  4. “A structural probe for finding syntax in word representations,” in Proceedings of the NAACL 2019, Minneapolis, Minnesota, June 2019, pp. 4129–4138.
  5. “Discourse-aware neural extractive text summarization,” in Proceedings of the ACL 2020, Online, July 2020, pp. 5021–5031, Association for Computational Linguistics.
  6. “Coreference-aware dialogue summarization,” in Proceedings of the SIGDIAL 2021, Singapore and Online, July 2021, pp. 509–519, Association for Computational Linguistics.
  7. “Tree transformer: Integrating tree structures into self-attention,” in Proceedings of the EMNLP-IJCNLP 2019, Hong Kong, China, Nov. 2019, pp. 1061–1070.
  8. “Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned,” in Proceedings of the ACL 2019, Florence, Italy, July 2019, pp. 5797–5808.
  9. “Are sixteen heads really better than one?,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  10. “SAMSum corpus: A human-annotated dialogue dataset for abstractive summarization,” in Proceedings of the 2nd Workshop on New Frontiers in Summarization, Hong Kong, China, Nov. 2019, pp. 70–79, Association for Computational Linguistics.
  11. “BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in Proceedings of the ACL 2020, Online, July 2020, pp. 7871–7880, Association for Computational Linguistics.
  12. “Topic-aware pointer-generator networks for summarizing spoken conversations,” in 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2019, pp. 814–821.
  13. “A survey on dialogue summarization: Recent advances and new frontiers,” arXiv preprint arXiv:2107.03175, 2021.
  14. “A simplest systematics for the organization of turn taking for conversation,” in Studies in the Organization of Conversational Interaction, JIM SCHENKEIN, Ed., pp. 7 – 55. Academic Press, 1978.
  15. “Speech and language processing: An introduction to speech recognition, computational linguistics and natural language processing,” Upper Saddle River, NJ: Prentice Hall, 2008.
  16. “CONFIT: Toward faithful dialogue summarization with linguistically-informed contrastive fine-tuning,” in Proceedings of the NAACL 2022, Seattle, United States, July 2022, pp. 5657–5668.
  17. “Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics,” in Proceedings of the ACL 2004, Barcelona, Spain, July 2004.
  18. “When bert plays the lottery, all tickets are winning,” in Proceedings of the EMNLP 2020, 2020, pp. 3208–3229.
  19. “Fixed encoder self-attention patterns in transformer based machine translation,” in Findings of the EMNLP 2020. 2020, pp. 556–568, Association for Computational Linguistics.
  20. “Multi-view sequence-to-sequence models with conversational structure for abstractive dialogue summarization,” in Proceedings of the EMNLP 2020, Online, Nov. 2020, pp. 4106–4118, Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Zhengyuan Liu (41 papers)
  2. Nancy F. Chen (97 papers)
Citations (1)