Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
96 tokens/sec
Gemini 2.5 Pro Premium
48 tokens/sec
GPT-5 Medium
15 tokens/sec
GPT-5 High Premium
23 tokens/sec
GPT-4o
104 tokens/sec
DeepSeek R1 via Azure Premium
77 tokens/sec
GPT OSS 120B via Groq Premium
466 tokens/sec
Kimi K2 via Groq Premium
201 tokens/sec
2000 character limit reached

DRIFT: Data Reduction via Informative Feature Transformation- Generalization Begins Before Deep Learning starts (2506.19734v1)

Published 24 Jun 2025 in cs.LG

Abstract: Modern deep learning architectures excel at optimization, but only after the data has entered the network. The true bottleneck lies in preparing the right input: minimal, salient, and structured in a way that reflects the essential patterns of the data. We propose DRIFT (Data Reduction via Informative Feature Transformation), a novel preprocessing technique inspired by vibrational analysis in physical systems, to identify and extract the most resonant modes of input data prior to training. Unlike traditional models that attempt to learn amidst both signal and noise, DRIFT mimics physics perception by emphasizing informative features while discarding irrelevant elements. The result is a more compact and interpretable representation that enhances training stability and generalization performance. In DRIFT, images are projected onto a low-dimensional basis formed by spatial vibration mode shapes of plates, offering a physically grounded feature set. This enables neural networks to operate with drastically fewer input dimensions (~ 50 features on MNIST and less than 100 on CIFAR100) while achieving competitive classification accuracy. Extensive experiments across MNIST and CIFAR100 demonstrate DRIFT's superiority over standard pixel-based models and PCA in terms of training stability, resistance to overfitting, and generalization robustness. Notably, DRIFT displays minimal sensitivity to changes in batch size, network architecture, and image resolution, further establishing it as a resilient and efficient data representation strategy. This work shifts the focus from architecture engineering to input curation and underscores the power of physics-driven data transformations in advancing deep learning performance.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Authors (1)