Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Commonsense Knowledge Mining from Pretrained Models (1909.00505v1)

Published 2 Sep 2019 in cs.CL, cs.AI, and cs.LG

Abstract: Inferring commonsense knowledge is a key challenge in natural language processing, but due to the sparsity of training data, previous work has shown that supervised methods for commonsense knowledge mining underperform when evaluated on novel data. In this work, we develop a method for generating commonsense knowledge using a large, pre-trained bidirectional LLM. By transforming relational triples into masked sentences, we can use this model to rank a triple's validity by the estimated pointwise mutual information between the two entities. Since we do not update the weights of the bidirectional model, our approach is not biased by the coverage of any one commonsense knowledge base. Though this method performs worse on a test set than models explicitly trained on a corresponding training set, it outperforms these methods when mining commonsense knowledge from new sources, suggesting that unsupervised techniques may generalize better than current supervised approaches.

Commonsense Knowledge Mining from Pretrained Models: An Analysis

The paper "Commonsense Knowledge Mining from Pretrained Models," authored by Joshua Feldman, Joe Davison, and Alexander M. Rush, addresses a significant challenge in the field of NLP: inferring commonsense knowledge. The authors propose an unsupervised method that leverages a large, pre-trained bidirectional LLM for generating commonsense knowledge, an approach diverging from traditional supervised methods.

The essence of this research lies in transforming relational triples into masked sentences. This transformation allows the LLM to rank the validity of triples using estimated pointwise mutual information between entities. Importantly, the method does not involve updating the weights of the bidirectional model, thus avoiding bias towards any specific commonsense knowledge base. While the performance of this model on test sets is inferior when compared to models trained on corresponding training sets, it exhibits superior capabilities in mining commonsense knowledge from new data sources, such as Wikipedia. This finding suggests a promising direction wherein unsupervised techniques might generalize better on novel data compared to existing supervised approaches.

Methodological Approach

The core of the methodology involves converting commonsense triples into natural language sentences using hand-crafted templates. A bidirectional masked LLM, inspired by frameworks like BERT, is employed to evaluate the likelihood of these sentences, acting as a proxy for their truthfulness. The approach hinges on the estimation of pointwise mutual information (PMI) between the components of a triple, calculated using masked LLMing.

Two LLMs form the bedrock of this method:

  1. Unidirectional LLMs: Used for sentence coherency evaluations.
  2. Bidirectional Masked LLMs: These provide conditional probabilities needed to assess PMI, thereby determining the validity of a triple.

Sentence generation critically influences the accuracy of this approach, where error-free and semantically correct sentence construction is vital. This step involves applying grammatical transformations such as the introduction of articles or conversion of infinitive verbs to gerunds, enhancing the robustness of generated sentences against syntactic errors.

Results and Implications

The authors conducted experiments encompassing commonsense knowledge base completion and Wikipedia mining. In the former, the technique achieved a test set F1 score of 78.8, which, while less impressive than some supervised methods, was on par with others. Results in Wikipedia mining were noteworthy, as the model demonstrated a stronger ability to generalize than supervised counterparts, achieving an average quality score of 3.00 for identified triples.

One significant observation is that sentence generation plays a pivotal role in model efficacy; misinterpretations or grammatical errors in sentence formulating adversely affect results. Understanding these impacts provides insights into potential avenues for further improvement, such as refining template designs for better encoding of relationship meanings.

Future Developments and Theoretical Implications

The research suggests several intriguing directions for future exploration. Extending this method to identify factual knowledge beyond commonsense or into generating new commonsense knowledge could be beneficial. Moreover, the development of more comprehensive evaluation protocols might enhance the validity of conclusions drawn from commonsense knowledge mining.

In theoretical terms, the paper challenges the traditional reliance on explicit training data sets and points towards the efficacy of leveraging large-scale, pre-trained models for unsupervised learning tasks. This shift from fine-tuning models to utilizing inherent world knowledge encapsulated within these LLMs could redefine approaches to knowledge mining across various domains.

In summary, this paper provides pivotal insights into the capabilities of pre-trained LLMs in inferencing commonsense knowledge, promoting an unsupervised approach that holds substantial promise for adapting to novel data without being constrained by predefined knowledge bases.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Joshua Feldman (3 papers)
  2. Joe Davison (5 papers)
  3. Alexander M. Rush (115 papers)
Citations (320)