InfFeed: Influence Functions as a Feedback to Improve the Performance of Subjective Tasks
Abstract: Recently, influence functions present an apparatus for achieving explainability for deep neural models by quantifying the perturbation of individual train instances that might impact a test prediction. Our objectives in this paper are twofold. First we incorporate influence functions as a feedback into the model to improve its performance. Second, in a dataset extension exercise, using influence functions to automatically identify data points that have been initially `silver' annotated by some existing method and need to be cross-checked (and corrected) by annotators to improve the model performance. To meet these objectives, in this paper, we introduce InfFeed, which uses influence functions to compute the influential instances for a target instance. Toward the first objective, we adjust the label of the target instance based on its influencer(s) label. In doing this, InfFeed outperforms the state-of-the-art baselines (including LLMs) by a maximum macro F1-score margin of almost 4% for hate speech classification, 3.5% for stance classification, and 3% for irony and 2% for sarcasm detection. Toward the second objective we show that manually re-annotating only those silver annotated data points in the extension set that have a negative influence can immensely improve the model performance bringing it very close to the scenario where all the data points in the extension set have gold labels. This allows for huge reduction of the number of data points that need to be manually annotated since out of the silver annotated extension dataset, the influence function scheme picks up ~1/1000 points that need manual correction.
- Modeltracker: Redesigning performance analysis tools for machine learning. Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems.
- Exploring transformer based models to identify hate speech and offensive content in english and indo-aryan languages.
- Influence functions in deep learning are fragile.
- A meta-transfer objective for learning to disentangle causal mechanisms. In 8th International Conference on Learning Representations (ICLR).
- Language (technology) is power: A critical survey of “bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476, Online. Association for Computational Linguistics.
- Will-they-won’t-they: A very large dataset for stance detection on twitter.
- Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In 2016 IEEE Symposium on Security and Privacy (SP), pages 598–617.
- Automated hate speech detection and the problem of offensive language. In Proceedings of the 11th International AAAI Conference on Web and Social Media, ICWSM ’17, pages 512–515.
- Bert: Pre-training of deep bidirectional transformers for language understanding.
- Automated rationale generation: A technique for explainable ai and its effects on human perceptions.
- Breaking NLI systems with sentences that require simple lexical inferences. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 650–655, Melbourne, Australia. Association for Computational Linguistics.
- Bryce Goodman and Seth Flaxman. 2017. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57.
- A survey of methods for explaining black box models. ACM Comput. Surv., 51(5).
- FastIF: Scalable influence functions for efficient model interpretation and debugging. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10333–10350, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Annotation artifacts in natural language inference data.
- Frank R. Hampel. 1974. The influence curve and its role in robust estimation. Journal of the American Statistical Association, 69(346):383–393.
- Explaining black box predictions and unveiling data artifacts through influence functions.
- Inaccurate labels in weakly-supervised deep learning: Automatic identification and correction and their impact on classification performance. IEEE Journal of Biomedical and Health Informatics, 24(9):2701–2710.
- Robin Jia and Percy Liang. 2017. Adversarial examples for evaluating reading comprehension systems.
- Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3):535–547.
- Efficient estimation of influence of a training instance. In Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, pages 41–47, Online. Association for Computational Linguistics.
- Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 1885–1894. PMLR.
- Resolving training biases via influence-based data relabeling. In International Conference on Learning Representations.
- What would elsa do? freezing layers during transformer fine-tuning.
- FIND: Human-in-the-Loop Debugging Deep Text Classifiers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 332–348, Online. Association for Computational Linguistics.
- A persona-based neural conversation model.
- Zachary C. Lipton. 2016. The mythos of model interpretability.
- Zachary C. Lipton and Jacob Steinhardt. 2018. Troubling trends in machine learning scholarship.
- James Martens. 2010. Deep learning via hessian-free optimization. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, page 735–742, Madison, WI, USA. Omnipress.
- Hatexplain: A benchmark dataset for explainable hate speech detection. Proceedings of the AAAI Conference on Artificial Intelligence, 35(17):14867–14875.
- A dataset for detecting stance in tweets. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 3945–3952, Portorož, Slovenia. European Language Resources Association (ELRA).
- Gradient-based automated iterative recovery for parameter-efficient tuning.
- Kwabena Nuamah and Alan Bundy. 2020. Explainable inference in the frank query answering system. In European Conference on Artificial Intelligence.
- Silviu Oprea and Walid Magdy. 2020. iSarcasm: A dataset of intended sarcasm. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1279–1289, Online. Association for Computational Linguistics.
- Explaining and improving model behavior with k nearest neighbor representations.
- "why should i trust you?": Explaining the predictions of any classifier.
- An investigation of why overparameterization exacerbates spurious correlations. In ICML, pages 8346–8356.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter.
- Learning important features through propagating activation differences. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 3145–3153. JMLR.org.
- Deep inside convolutional networks: Visualising image classification models and saliency maps.
- Mitigating gender bias in natural language processing: Literature review. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1630–1640, Florence, Italy. Association for Computational Linguistics.
- Interactive label cleaning with example-based explanations. In Advances in Neural Information Processing Systems.
- Stefano Teso and Kristian Kersting. 2019. Explanatory interactive machine learning. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’19, page 239–245, New York, NY, USA. Association for Computing Machinery.
- Daniel Ting and Eric Brochu. 2018. Optimal subsampling with influence functions. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc.
- SemEval-2018 task 3: Irony detection in English tweets. In Proceedings of The 12th International Workshop on Semantic Evaluation, pages 39–50, New Orleans, Louisiana. Association for Computational Linguistics.
- Data dropout: Optimizing training data for convolutional neural networks.
- Less is better: Unweighted data subsampling via influence function. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).
- Jincheng Xu and Qingfeng Du. 2020. On the interpretation of convolutional neural networks for text classification. In European Conference on Artificial Intelligence.
- Generative data augmentation for commonsense reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1008–1025, Online. Association for Computational Linguistics.
- HILDIF: Interactive debugging of NLI models using influence functions. In Proceedings of the First Workshop on Interactive Learning for Natural Language Processing, pages 1–6, Online. Association for Computational Linguistics.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.