Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 102 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 30 tok/s
GPT-5 High 27 tok/s Pro
GPT-4o 110 tok/s
GPT OSS 120B 475 tok/s Pro
Kimi K2 203 tok/s Pro
2000 character limit reached

Topology-Preserving Scaling in Data Augmentation (2411.19512v1)

Published 29 Nov 2024 in math.AT, cs.IT, cs.LG, and math.IT

Abstract: We propose an algorithmic framework for dataset normalization in data augmentation pipelines that preserves topological stability under non-uniform scaling transformations. Given a finite metric space ( X \subset \mathbb{R}n ) with Euclidean distance ( d_X ), we consider scaling transformations defined by scaling factors ( s_1, s_2, \ldots, s_n > 0 ). Specifically, we define a scaling function ( S ) that maps each point ( x = (x_1, x_2, \ldots, x_n) \in X ) to [ S(x) = (s_1 x_1, s_2 x_2, \ldots, s_n x_n). ] Our main result establishes that the bottleneck distance ( d_B(D, D_S) ) between the persistence diagrams ( D ) of ( X ) and ( D_S ) of ( S(X) ) satisfies: [ d_B(D, D_S) \leq (s_{\max} - s_{\min}) \cdot \operatorname{diam}(X), ] where ( s_{\min} = \min_{1 \leq i \leq n} s_i ), ( s_{\max} = \max_{1 \leq i \leq n} s_i ), and ( \operatorname{diam}(X) ) is the diameter of ( X ). Based on this theoretical guarantee, we formulate an optimization problem to minimize the scaling variability ( \Delta_s = s_{\max} - s_{\min} ) under the constraint ( d_B(D, D_S) \leq \epsilon ), where ( \epsilon > 0 ) is a user-defined tolerance. We develop an algorithmic solution to this problem, ensuring that data augmentation via scaling transformations preserves essential topological features. We further extend our analysis to higher-dimensional homological features, alternative metrics such as the Wasserstein distance, and iterative or probabilistic scaling scenarios. Our contributions provide a rigorous mathematical framework for dataset normalization in data augmentation pipelines, ensuring that essential topological characteristics are maintained despite scaling transformations.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Authors (2)