Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

119 tokens/sec

GPT-4o

56 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

6 tokens/sec

GPT-4.1 Pro

47 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Learning with Noisy Foundation Models (2403.06869v1)

Published 11 Mar 2024 in cs.LG, cs.AI, cs.CL, and cs.CV

Abstract: Foundation models are usually pre-trained on large-scale datasets and then adapted to downstream tasks through tuning. However, the large-scale pre-training datasets, often inaccessible or too expensive to handle, can contain label noise that may adversely affect the generalization of the model and pose unexpected risks. This paper stands out as the first work to comprehensively understand and analyze the nature of noise in pre-training datasets and then effectively mitigate its impacts on downstream tasks. Specifically, through extensive experiments of fully-supervised and image-text contrastive pre-training on synthetic noisy ImageNet-1K, YFCC15M, and CC12M datasets, we demonstrate that, while slight noise in pre-training can benefit in-domain (ID) performance, where the training and testing data share a similar distribution, it always deteriorates out-of-domain (OOD) performance, where training and testing distributions are significantly different. These observations are agnostic to scales of pre-training datasets, pre-training noise types, model architectures, pre-training objectives, downstream tuning methods, and downstream applications. We empirically ascertain that the reason behind this is that the pre-training noise shapes the feature space differently. We then propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization, which is applicable in both parameter-efficient and black-box tuning manners. We additionally conduct extensive experiments on popular vision and LLMs, including APIs, which are supervised and self-supervised pre-trained on realistic noisy data for evaluation. Our analysis and results demonstrate the importance of this novel and fundamental research direction, which we term as Noisy Model Learning.

References (158)

Authors (8)

Hao Chen (1006 papers)
Jindong Wang (150 papers)
Zihan Wang (181 papers)
Ran Tao (82 papers)
Hongxin Wei (45 papers)
Xing Xie (220 papers)
Masashi Sugiyama (286 papers)
Bhiksha Raj (180 papers)

Citations (3)

View on Semantic Scholar

Summary

Enhancements in Understanding Noise Impact on Foundation Models Across Architectures

Introduction to the Study

The researchers have expanded upon their prior work to delve into the influence of pre-training noise on foundation models. This revised paper stands out for broadening the scope of examination to include both convolutional and transformer-based architectures, specifically adding the Vision Transformer (ViT-B-16) to their analysis alongside the previously studied ResNet-50. The incorporation of these models facilitates a nuanced understanding of how pre-training noise affects various aspects of model performance across different architectures.

Key Findings and Contributions

Expanded Architectural Analysis

The extension to include ViT-B-16 markedly enhances the research, allowing for a comprehensive analysis across a broader spectrum of model architectures. This architectural inclusivity is instrumental in corroborating the architecture-agnostic nature of the observed phenomena related to pre-training noise. Through this expanded lens, the paper demonstrates the robustness and adaptability of foundation models to noise, showcasing consistent phenomena across a range of tasks including classification, detection, and segmentation.

In-depth Feature Analysis

By analyzing both ViT-B-16 and ResNet-50, the paper provides insightful revelations on how pre-training noise influences feature representation and processing within these models. This aspect is pivotal, offering a granular view of the impacts of noise and prompting a broader discussion on the resilience of model architectures to varying noise levels in the pre-training phase.

Methodological Advancements with NMTune

The refinement of the proposed NMTune methodology represents a significant breakthrough, enhancing versatility and efficacy across multiple tuning paradigms including black-box and parameter-efficient approaches. NMTune’s adaptability across different architectural frameworks underscores its potential utility in tackling practical challenges related to noise in foundation models.

Asymmetric Noise Analysis

The novel investigation into asymmetric pre-training noise unveils critical insights into the differential impact of various noise types on model learning and generalization. This segment of research challenges existing paradigms and fosters a deeper understanding of optimizing pre-training strategies to mitigate noise effects.

Implications and Future Directions

The findings from this paper have profound implications for the development and optimization of foundation models. By highlighting the resilience of such models to pre-training noise and proposing effective methods to enhance their performance, the research opens new vistas for leveraging noisy data in model training. The architectural and methodological inclusivity of this paper paves the way for future explorations into noise resilience across emerging model frameworks.

Furthermore, the insights on asymmetric pre-training noise introduce intriguing prospects for refining pre-training techniques and noise mitigation strategies. This could lead to more robust models capable of better generalization and performance in real-world applications subjected to noisy inputs.

Concluding Remarks

This enhanced paper contributes significantly to the foundational understanding of noise impacts on various models, including convolutional and transformer-based architectures. The methodological advancements, coupled with a deeper analysis of noise effects, offer valuable pathways for future research on optimizing foundation models. The versatility and broad applicability of these findings underscore the potential for practical implementations, making this paper a pivotal reference for researchers and practitioners aiming to leverage noisy data effectively in model training processes.

PDF Markdown

Tweets

https://twitter.com/jd92wang/status/1769663332800491612

https://twitter.com/fly51fly/status/1767672011466412129