Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 88 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 12 tok/s Pro

GPT-5 High 19 tok/s Pro

GPT-4o 110 tok/s Pro

GPT OSS 120B 470 tok/s Pro

Kimi K2 197 tok/s Pro

2000 character limit reached

UNIFORM: Unifying Knowledge from Large-scale and Diverse Pre-trained Models (2508.19498v1)

Published 27 Aug 2025 in cs.CV and cs.LG

Abstract: In the era of deep learning, the increasing number of pre-trained models available online presents a wealth of knowledge. These models, developed with diverse architectures and trained on varied datasets for different tasks, provide unique interpretations of the real world. Their collective consensus is likely universal and generalizable to unseen data. However, effectively harnessing this collective knowledge poses a fundamental challenge due to the heterogeneity of pre-trained models. Existing knowledge integration solutions typically rely on strong assumptions about training data distributions and network architectures, limiting them to learning only from specific types of models and resulting in data and/or inductive biases. In this work, we introduce a novel framework, namely UNIFORM, for knowledge transfer from a diverse set of off-the-shelf models into one student model without such constraints. Specifically, we propose a dedicated voting mechanism to capture the consensus of knowledge both at the logit level -- incorporating teacher models that are capable of predicting target classes of interest -- and at the feature level, utilizing visual representations learned on arbitrary label spaces. Extensive experiments demonstrate that UNIFORM effectively enhances unsupervised object recognition performance compared to strong knowledge transfer baselines. Notably, it exhibits remarkable scalability by benefiting from over one hundred teachers, while existing methods saturate at a much smaller scale.

Collections

Summary

The paper introduces the Uniform framework, which unifies knowledge from diverse pre-trained models via innovative feature and logits voting mechanisms.
It employs feature alignment with reconstruction losses and pseudo-label-based distillation to reconcile conflicts among teacher predictions.
Experimental results across 11 benchmark datasets demonstrate superior scalability and performance improvements over traditional knowledge distillation methods.

Uniform: Unifying Knowledge from Large-scale and Diverse Pre-trained Models

The technical advancement in deep learning has led to a proliferation of publicly available pre-trained models, each offering unique insights into the real world. The paper "UNIFORM: Unifying Knowledge from Large-scale and Diverse Pre-trained Models" introduces a novel framework called Uniform, which effectively integrates knowledge from heterogeneous pre-trained models to enhance unsupervised object recognition.

Introduction

The rapid accumulation of pre-trained models available on platforms like HuggingFace has opened avenues for harnessing the diverse expertise embedded within these models. Despite the wealth of knowledge, challenges arise due to varying network architectures and the heterogeneity of data on which these models are trained. Traditional methods such as model merging, mixture-of-experts, and knowledge distillation impose constraints on architecture and data homogeneity, often resulting in biases.

Uniform addresses these limitations by proposing a framework that enables knowledge transfer from both homogeneous models—those sharing architecture and label space—and heterogeneous models with distinct architectures or data classes.

Figure 1: The dramatic increase in publicly available models and an introduction to Uniform’s approach in knowledge transfer.

Methodology

Uniform's methodology hinges on two innovative voting mechanisms designed to extract consensus knowledge from a range of teacher models at both the feature and logit levels.

Features Voting and Transfer

The feature-level knowledge integration involves unifying the latent feature spaces of diverse teachers into a common space with the student model. This is achieved using encoders that project teacher features into an aligned space, which is regularized using reconstruction losses to prevent degeneration. The challenge of sign conflicts, where averaging teacher features could result in misleading zero vectors, is mitigated through a voting-based mechanism that aligns feature directions before aggregation.

Figure 2: Mitigating sign conflicts through a two-stage framework involving voting and feature transfer.

Logits Voting and Transfer

At the logits level, discrepancies in prediction distributions among teachers are tackled by focusing on class votes. Pseudo-labels are derived by identifying the most frequent class prediction among teachers, thus emphasizing the most likely classes in the transfer process. This selective knowledge distillation reduces student confusion stemming from conflicting teacher outputs.

Figure 3: Decomposing knowledge transfer into pseudo and non-pseudo logits to handle distribution conflicts.

Experimental Results

The efficacy of Uniform was evaluated across 11 benchmark datasets, including CUB200, Stanford Dogs, and various others, with students displaying significant performance improvements. Uniform consistently outperformed traditional KD methods like CFL and OFA, particularly demonstrating superior scalability by managing knowledge from over one hundred pre-trained teachers without saturation.

The comparison of varying descriptive and predictive teacher numbers showed that Uniform's voting mechanisms allowed it to leverage a broader range of teacher models efficiently, maximizing the benefit drawn from freely available public knowledge.

Figure 4: Performance of Uniform scaling up with an increasing number of public descriptive teachers under a multidataset configuration.

Conclusion

Uniform presents a step forward in utilizing the collective wisdom of publicly accessible pre-trained models for unsupervised tasks. By prioritizing consensus through novel voting strategies, the framework adeptly navigates the complexities associated with heterogeneous model architectures and label spaces. Its scalable performance across various datasets highlights its potential in broadening the applicability and generalizability of vision models in diverse scenarios.

This framework paves the way for future explorations into integrating knowledge from even larger model repositories and extending these methodologies to other domains beyond visual recognition.