ASER: Towards Large-scale Commonsense Knowledge Acquisition via Higher-order Selectional Preference over Eventualities (2104.02137v2)

Published 5 Apr 2021 in cs.AI and cs.CL

Abstract: Commonsense knowledge acquisition and reasoning have long been a core artificial intelligence problem. However, in the past, there has been a lack of scalable methods to collect commonsense knowledge. In this paper, we propose to develop principles for collecting commonsense knowledge based on selectional preference. We generalize the definition of selectional preference from one-hop linguistic syntactic relations to higher-order relations over linguistic graphs. Unlike previous commonsense knowledge definition (e.g., ConceptNet), selectional preference (SP) knowledge only relies on statistical distribution over linguistic graphs, which can be efficiently and accurately acquired from the unlabeled corpus with modern tools. Following this principle, we develop a large-scale eventuality (a linguistic term covering activity, state, and event)-based knowledge graph ASER, where each eventuality is represented as a dependency graph, and the relation between them is a discourse relation defined in shallow discourse parsing. The higher-order selectional preference over collected linguistic graphs reflects various kinds of commonsense knowledge. Moreover, motivated by the observation that humans understand events by abstracting the observed events to a higher level and can thus transfer their knowledge to new events, we propose a conceptualization module to significantly boost the coverage of ASER. In total, ASER contains 648 million edges between 438 million eventualities. After conceptualization with Probase, a selectional preference based concept-instance relational knowledge base, our concept graph contains 15 million conceptualized eventualities and 224 million edges between them. Detailed analysis is provided to demonstrate its quality. All the collected data, APIs, and tools are available at https://github.com/HKUST-KnowComp/ASER.

PDF Abstract

Overview of ASER: Large-scale Commonsense Knowledge Acquisition

The pursuit of effectively representing and acquiring commonsense knowledge has long remained a formidable endeavor within artificial intelligence research. In the paper titled "ASER: Towards Large-scale Commonsense Knowledge Acquisition via Higher-order Selectional Preference over Eventualities," the authors propose a novel framework aiming to systematically capture such knowledge. The authors introduce ASER, a knowledge graph that draws upon the concept of selectional preference extended to higher orders, formulated over large linguistic graphs derived from substantial text corpora.

Central to ASER's approach is utilizing eventualities—encompassing activities, states, and events—as the fundamental semantic unit. Previous commonsense knowledge bases, like ConceptNet and others, often rely heavily on human-annotated relational triples, which can be costly and challenging to scale. Contrarily, ASER extracts knowledge by detecting statistical patterns over linguistic dependency graphs and discourse relations. This technique allows ASER to efficiently harness large quantities of commonsense knowledge from raw, unlabeled text data independently of pre-established, manually annotated frameworks.

The authors detail a comprehensive methodology encapsulating two main processes: linguistic pattern extraction and conceptualization. Eventualities are extracted by employing syntactical parsing and specific dependency patterns designed to ensure semantic completeness without undue complexity. For instance, ASER captures multi-relational instances such as "I eat food" and considers their probabilistic frequency distribution. Relations between eventualities are retrieved through explicit discourse parsing, emphasizing quality by focusing on explicit relations, thereby resulting in a reliable, scalable strategy that ensures the capture of extensive knowledge.

Once collected, ASER uses external robust taxonomies, notably Probase, to perform conceptualization. This advances ASER beyond basic instance-level observations by abstracting eventualities into broader concepts, thus facilitating enhanced generalization while circumventing the often limited textual availability of some commonsense knowledge.

ASER comprises a massive collection, featuring over 438 million eventualities and 648 million edges, across different eventuality patterns, allowing it to cover a broader spectrum of commonsense relations than prior models. The paper meticulously evaluates the quality of ASER through intrinsic (human evaluation and statistical analyses) and extrinsic evaluations, showing that ASER can effectively transfer its higher-order selectional preference knowledge to artificially reproduce human-curated structures such as ConceptNet.

Significant implications emerge from ASER’s development, contributing both practical applications and theoretical advancement. Practically, ASER provides a robust repository for commonsense inference in applications like dialogue systems, and reading comprehension tasks, illustrating superior utility over various benchmarks. Theoretically, the introduction of ASER paves the way for a refined understanding of selectional preference as a conduit for generalizable semantic knowledge.

The ASER research opens the land to future expansions and refinements, notably regarding the contextualization of conceptualization, scalability in computational terms, and developing targeted evaluations aligning directly with true commonsense reasoning efforts. The successful structuring of such a graph predicates interesting directions where ASER could supplement pre-trained LLMs to enhance comprehension by delivering complex event knowledge. Releasing ASER's extensive resources into the broader AI community ensures its widespread utility, fostering collaborative advancements toward mastering commonsense understanding.