ResBit: Residual Bit Vector for Categorical Values (2309.17196v4)

Published 29 Sep 2023 in cs.LG

Abstract: One-hot vectors, a common method for representing discrete/categorical data, in machine learning are widely used because of their simplicity and intuitiveness. However, one-hot vectors suffer from a linear increase in dimensionality, posing computational and memory challenges, especially when dealing with datasets containing numerous categories. In this paper, we focus on tabular data generation, and reveal the multinomial diffusion faces the mode collapse phenomenon when the cardinality is high. Moreover, due to the limitations of one-hot vectors, the training phase takes time longer in such a situation. To address these issues, we propose Residual Bit Vectors (ResBit), a technique for densely representing categorical data. ResBit is an extension of analog bits and overcomes limitations of analog bits when applied to tabular data generation. Our experiments demonstrate that ResBit not only accelerates training but also maintains performance when compared with the situations before applying ResBit. Furthermore, our results indicate that many existing methods struggle with high-cardinality data, underscoring the need for lower-dimensional representations, such as ResBit and latent vectors.

References (41)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/funny_hat_/status/1812891333461438950

https://twitter.com/a_b7k/status/1868601675487015106

ResBit: Residual Bit Vector for Categorical Values (2309.17196v4)

Summary

Related Papers

Tweets