Symbolic Knowledge Distillation: from General Language Models to Commonsense Models (2110.07178v2)

Published 14 Oct 2021 in cs.CL

Abstract: The common practice for training commonsense models has gone from-human-to-corpus-to-machine: humans author commonsense knowledge graphs in order to train commonsense models. In this work, we investigate an alternative, from-machine-to-corpus-to-machine: general LLMs author these commonsense knowledge graphs to train commonsense models. Our study leads to a new framework, Symbolic Knowledge Distillation. As with prior art in Knowledge Distillation (Hinton et al., 2015), our approach uses larger models to teach smaller models. A key difference is that we distill knowledge symbolically-as text-in addition to the neural model. We also distill only one aspect-the commonsense of a general LLM teacher, allowing the student to be a different type, a commonsense model. Altogether, we show that careful prompt engineering and a separately trained critic model allow us to selectively distill high-quality causal commonsense from GPT-3, a general LLM. Empirical results demonstrate that, for the first time, a human-authored commonsense knowledge graph is surpassed by our automatically distilled variant in all three criteria: quantity, quality, and diversity. In addition, it results in a neural commonsense model that surpasses the teacher model's commonsense capabilities despite its 100x smaller size. We apply this to the ATOMIC resource, and share our new symbolic knowledge graph and commonsense models.

PDF Abstract

Insights into Symbolic Knowledge Distillation for Commonsense Models

The paper "Symbolic Knowledge Distillation: From General LLMs to Commonsense Models" presents a systematic approach to enhance commonsense reasoning models by leveraging knowledge distillation techniques from large, general-purpose LLMs like GPT-3. This work introduces the framework of Symbolic Knowledge Distillation (SKD), which distills the distilled knowledge in natural language form rather than solely through neural representations.

Overview of Symbolic Knowledge Distillation

Traditionally, commonsense models are trained on knowledge graphs manually authored by humans. This process is labor-intensive, costly, and struggles to scale. SKD proposes an alternative pipeline: machine-generated knowledge replaces human-authored knowledge as the foundation for training commonsense models. This transition is articulated through a machine-to-corpus-to-machine approach, where generation from LLMs constructs symbolic knowledge graphs subsequently utilized in training more compact models.

In alignment with knowledge distillation principles, SKD uses a large, general LLM to generate ostensibly high-quality textual data, which it then selectively refines using a critic model. The critic model is an additional component designed to assess and refine the quality of the generated commonsense knowledge from GPT-3. By employing strategic prompt engineering and a separate filtering mechanism, SKD aims to extract and distill the commonsense reasoning capabilities of GPT-3 effectively.

Key Empirical Findings

The empirical findings underscore several significant results:

Scale, Quality, and Diversity: Compared to human-authored knowledge graphs, the automatically distilled knowledge graph surpasses in not only the scale but also in quality and diversity, marking a noteworthy achievement for SKD.
Model Performance: A neural model trained on the distilled knowledge surpasses the commonsense reasoning capabilities of GPT-3 itself despite having significantly fewer parameters. This 100x smaller model, termed Comet, achieves superior accuracy in commonsense inference tasks.
Cost Efficiency: The transition from human-generated to machine-authored knowledge graphs results in a dramatically lower cost per triple, making SKD an economically viable alternative to traditional methods.

Implications and Speculations

The introduction of SKD holds substantial implications for both practical applications and theoretical advancements in AI:

Practical Implementation: SKD provides an efficient framework for devising robust commonsense reasoning models without incurring the prohibitive costs associated with human annotation. This indicates a shift towards more sustainable practices in AI model training by reducing human labor costs and scaling capabilities.
Theoretical Advancements: By successfully distilling knowledge from general LLMs, SKD opens avenues for further exploration into the specialized knowledge domains that LLMs can augment. The framework sets a precedent for future research into extracting specific functionalities from otherwise general models.
Future Developments: While promising, SKD's primary focus is on causal commonsense reasoning. Future research could explore extending this method to other types of commonsense, such as physical or temporal reasoning, thereby broadening the applicability of the distilled models.

In conclusion, the exploration of Symbolic Knowledge Distillation asserts a transformative yet methodical shift in constructing and training commonsense AI models. The demonstrated success of SKD in leveraging the capabilities of LLMs represents a noteworthy stride towards more potent, resource-efficient AI systems capable of nuanced reasoning akin to human commonsense. As research progresses, it is anticipated that symbolic knowledge distillation will synergize with emerging AI methodologies, producing increasingly sophisticated inference models across various domains.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

Peter West (76 papers)
Chandra Bhagavatula (46 papers)
Jack Hessel (50 papers)
Jena D. Hwang (36 papers)
Liwei Jiang (53 papers)
Ronan Le Bras (56 papers)
Ximing Lu (52 papers)
Sean Welleck (54 papers)
Yejin Choi (287 papers)

Citations (293)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos