Papers
Topics
Authors
Recent
Search
2000 character limit reached

Influence Tuning: Demoting Spurious Correlations via Instance Attribution and Instance-Driven Updates

Published 7 Oct 2021 in cs.CL | (2110.03212v1)

Abstract: Among the most critical limitations of deep learning NLP models are their lack of interpretability, and their reliance on spurious correlations. Prior work proposed various approaches to interpreting the black-box models to unveil the spurious correlations, but the research was primarily used in human-computer interaction scenarios. It still remains underexplored whether or how such model interpretations can be used to automatically "unlearn" confounding features. In this work, we propose influence tuning--a procedure that leverages model interpretations to update the model parameters towards a plausible interpretation (rather than an interpretation that relies on spurious patterns in the data) in addition to learning to predict the task labels. We show that in a controlled setup, influence tuning can help deconfounding the model from spurious patterns in data, significantly outperforming baseline methods that use adversarial training.

Authors (2)
Citations (29)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.