Dependency-based Text Graphs for Keyphrase and Summary Extraction with Applications to Interactive Content Retrieval (1909.09742v1)

Published 20 Sep 2019 in cs.AI

Abstract: We build a bridge between neural network-based machine learning and graph-based natural language processing and introduce a unified approach to keyphrase, summary and relation extraction by aggregating dependency graphs from links provided by a deep-learning based dependency parser. We reorganize dependency graphs to focus on the most relevant content elements of a sentence, integrate sentence identifiers as graph nodes and after ranking the graph, we extract our keyphrases and summaries from its largest strongly-connected component. We take advantage of the implicit structural information that dependency links bring to extract subject-verb-object, is-a and part-of relations. We put it all together into a proof-of-concept dialog engine that specializes the text graph with respect to a query and reveals interactively the document's most relevant content elements. The open-source code of the integrated system is available at https://github.com/ptarau/DeepRank . Keywords: graph-based natural language processing, dependency graphs, keyphrase, summary and relation extraction, query-driven salient sentence extraction, logic-based dialog engine, synergies between neural and symbolic processing.

Citations (3)

View on Semantic Scholar

Summary

The paper introduces a unified algorithm that constructs dependency-based text graphs to simultaneously extract keyphrases, summaries, and relationships.
It employs a deep-learning dependency parser to reorganize text into graph nodes, achieving state-of-the-art F1 scores and scalability with large documents.
The system integrates neural and symbolic NLP techniques to enable interactive, real-time content retrieval, enhancing both theoretical and practical applications.

Dependency-based Text Graphs for Keyphrase and Summary Extraction

The paper introduces a sophisticated approach for keyphrase and summary extraction by constructing dependency-based text graphs. This method bridges neural network-based machine learning capabilities with graph-based NLP. By leveraging dependency graphs derived from a deep-learning dependency parser, the authors present an integrated system for extracting key text elements and relations.

The core approach involves reorganizing dependency graphs to emphasize significant content. Sentences are identified as graph nodes, and keyphrases and summaries are derived from the largest strongly-connected component of the graph. This strategy utilizes the implicit structural data from dependency links, extracting subject-verb-object, is-a, and part-of relations, thus marrying syntactic and semantic analysis. Notably, a proof-of-concept dialog engine is implemented, allowing interactive retrieval of a document's salient content in response to specific queries.

Methodological Advances

This unified algorithm significantly contributes to the field of NLP by:

Consolidating keyphrase, summary, and relation extraction processes into a single algorithm.
Demonstrating state-of-the-art performance with scalability to large documents.
Integrating a logic-based post-processing engine that supports real-time, interactive content retrieval.

Numerical Results and Claims

The authors conducted quantitative evaluations using the Krapivin document set. Their algorithm surpassed existing graph-based systems in keyphrase extraction, achieving competitive F1 scores relative to state-of-the-art models like CopyRNN. Moreover, the scalability was proven with rapid processing times, even for voluminous texts, underscoring the system's practical viability.

Implications and Future Perspectives

This research carries both theoretical and practical implications. Theoretically, it exemplifies how symbolic reasoning can complement neural methodologies, offering a robust framework for text interpretation. Practically, the implementation demonstrates potential applications in interactive content retrieval and summarization, highlighting the system's utility in educational and informational contexts.

Future developments in AI could build on this work to enhance the interpretability and precision of NLP systems. By integrating more refined semantic processing and expanding to multilingual contexts through Universal Dependencies, such systems could gain broader applicability and impact.

In conclusion, the paper effectively presents a cohesive narrative that aligns dependency parsing with innovative graph-based analyses, marking a step forward in the convergence of neural and symbolic NLP techniques.