Predicting Discourse Trees from Transformer-based Neural Summarizers (2104.07058v1)
Abstract: Previous work indicates that discourse information benefits summarization. In this paper, we explore whether this synergy between discourse and summarization is bidirectional, by inferring document-level discourse trees from pre-trained neural summarizers. In particular, we generate unlabeled RST-style discourse trees from the self-attention matrices of the transformer model. Experiments across models and datasets reveal that the summarizer learns both, dependency- and constituency-style discourse information, which is typically encoded in a single head, covering long- and short-distance discourse dependencies. Overall, the experimental results suggest that the learned discourse information is general and transferable inter-domain.
- Wen Xiao (32 papers)
- Patrick Huber (146 papers)
- Giuseppe Carenini (52 papers)