Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
51 tokens/sec
2000 character limit reached

A Comprehensive Survey on Data Augmentation (2405.09591v3)

Published 15 May 2024 in cs.LG and cs.AI

Abstract: Data augmentation is a series of techniques that generate high-quality artificial data by manipulating existing data samples. By leveraging data augmentation techniques, AI models can achieve significantly improved applicability in tasks involving scarce or imbalanced datasets, thereby substantially enhancing AI models' generalization capabilities. Existing literature surveys only focus on a certain type of specific modality data, and categorize these methods from modality-specific and operation-centric perspectives, which lacks a consistent summary of data augmentation methods across multiple modalities and limits the comprehension of how existing data samples serve the data augmentation process. To bridge this gap, we propose a more enlightening taxonomy that encompasses data augmentation techniques for different common data modalities. Specifically, from a data-centric perspective, this survey proposes a modality-independent taxonomy by investigating how to take advantage of the intrinsic relationship between data samples, including single-wise, pair-wise, and population-wise sample data augmentation methods. Additionally, we categorize data augmentation methods across five data modalities through a unified inductive approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (231)
  1. A dynamic histogram equalization for image contrast enhancement. IEEE transactions on consumer electronics 53, 2 (2007), 593–600.
  2. Do not have enough data? Deep learning to the rescue!. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 7383–7390.
  3. Jacob Andreas. 2019. Good-enough compositional data augmentation. arXiv preprint arXiv:1904.09545 (2019).
  4. DiffWire: Inductive Graph Rewiring via the Lovász Bound. In Learning on Graphs Conference. PMLR, 15–1.
  5. Unsupervised neural machine translation. In 6th International Conference on Learning Representations, ICLR 2018.
  6. Underwater Ambient-Noise Removing GAN Based on Magnitude and Phase Spectra. IEEE Access 9 (2021), 24513–24530.
  7. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015.
  8. Scarf: Self-Supervised Contrastive Learning using Random Feature Corruption. In International Conference on Learning Representations.
  9. Salvador V Balkus and Donghui Yan. 2022. Improving short text classification with augmented data using GPT-3. Natural Language Engineering (2022), 1–30.
  10. Looking beyond appearances: Synthetic training data for deep cnns in re-identification. Computer Vision and Image Understanding 167 (2018), 50–62.
  11. A survey on data augmentation for text classification. Comput. Surveys 55, 7 (2022), 1–39.
  12. Accurate medium-range global weather forecasting with 3D neural networks. Nature 619, 7970 (2023), 533–538.
  13. Make heterophily graphs better fit gnn: A graph rewiring approach. arXiv preprint arXiv:2209.08264 (2022).
  14. Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems (2022).
  15. Gan augmentation: Augmenting training data using generative adversarial networks. arXiv preprint arXiv:1810.10863 (2018).
  16. Generating sentences from a continuous space. arXiv preprint arXiv:1511.06349 (2015).
  17. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  18. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.
  19. Lexical-constraint-aware neural machine translation via data augmentation. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. 3587–3593.
  20. DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification. In Proceedings of the 29th International Conference on Computational Linguistics. 4622–4632.
  21. FakeTables: Using GANs to Generate Functional Dependency Preserving Tables with Bounded Real Data.. In IJCAI. 2074–2080.
  22. Automatic image cropping: A computational complexity study. In Proceedings of the IEEE conference on computer vision and pattern recognition. 507–515.
  23. ExcelFormer: A Neural Network Surpassing GBDTs on Tabular Data. arXiv preprint arXiv:2301.02819 (2023).
  24. Gridmask data augmentation. arXiv preprint arXiv:2001.04086 (2020).
  25. Iterative deep graph learning for graph neural networks: Better and robust node embeddings. Advances in neural information processing systems 33 (2020), 19314–19326.
  26. Learning to photograph. In Proceedings of the 18th ACM international conference on Multimedia. 291–300.
  27. Heng-Da Cheng and XJ Shi. 2004. A simple and effective histogram equalization approach to image enhancement. Digital signal processing 14, 2 (2004), 158–170.
  28. Remix: rebalanced mixup. In Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16. Springer, 95–110.
  29. Self-adaptive image cropping for small displays. IEEE Transactions on Consumer Electronics 53, 4 (2007), 1622–1627.
  30. Multi-column deep neural networks for image classification. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 3642–3649.
  31. High-performance neural networks for visual object classification. arXiv preprint arXiv:1102.0183 (2011).
  32. STL: A seasonal-trend decomposition. J. Off. Stat 6, 1 (1990), 3–73.
  33. Improving Automated Evaluation of Student Text Responses Using GPT-3.5 for Text Data Augmentation. In International Conference on Artificial Intelligence in Education. Springer, 217–228.
  34. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 113–123.
  35. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 702–703.
  36. Multi-scale convolutional neural networks for time series classification. arXiv preprint arXiv:1603.06995 (2016).
  37. Sajad Darabi and Yotam Elor. 2021. Synthesising multi-modal minority samples for tabular data. arXiv preprint arXiv:2105.08204 (2021).
  38. Contrastive mixup: Self-and semi-supervised learning for tabular domain. arXiv preprint arXiv:2108.12296 (2021).
  39. Mathieu Dehouck and Carlos Gómez-Rodríguez. 2020. Data augmentation via subtree swapping for dependency parsing of low-resource languages. In 28th International Conference on Computational Linguistics. International Committee on Computational Linguistics; International …, 3818–3830.
  40. Juan Manuel Davila Delgado and Lukumon Oyedele. 2021. Deep learning with small datasets: using autoencoders to address limited datasets in construction management. Applied Soft Computing 112 (2021), 107836.
  41. Terrance DeVries and Graham W Taylor. 2017a. Dataset augmentation in feature space. arXiv preprint arXiv:1702.05538 (2017).
  42. Terrance DeVries and Graham W Taylor. 2017b. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552 (2017).
  43. Data augmentation for deep graph learning: A survey. ACM SIGKDD Explorations Newsletter 24, 2 (2022), 61–77.
  44. Adversarial Audio Synthesis. In International Conference on Learning Representations.
  45. Learning Enhanced Representation for Tabular Data via Neighborhood Propagation. Advances in Neural Information Processing Systems 35 (2022), 16373–16384.
  46. Syntax-aware data augmentation for neural machine translation. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2023).
  47. Guiding Generative Language Models for Data Augmentation in Few-Shot Text Classification. In Proceedings of the Fourth Workshop on Data Science with Human-in-the-Loop (Language Advances). 51–63.
  48. Justin Engelmann and Stefan Lessmann. 2021. Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning. Expert Systems with Applications 174 (2021), 114582.
  49. Real-valued (medical) time series generation with recurrent conditional gans. arXiv preprint arXiv:1706.02633 (2017).
  50. Autofs: Automated feature selection via diversity-aware interactive reinforcement learning. In 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 1008–1013.
  51. Semi-Supervised Learning with Data Augmentation for Tabular Data. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 3928–3932.
  52. Dropmessage: Unifying random dropping for graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 4267–4275.
  53. A Survey of Data Augmentation Approaches for NLP. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 968–988.
  54. Graph random neural networks for semi-supervised learning on graphs. Advances in neural information processing systems 33 (2020), 22092–22103.
  55. Joao Fonseca and Fernando Bacao. 2023. Tabular and latent space synthetic data generation: a literature review. Journal of Big Data 10, 1 (2023), 115.
  56. Generating synthetic time series to augment sparse datasets. In 2017 IEEE international conference on data mining (ICDM). IEEE, 865–870.
  57. Learning discrete structures for graph neural networks. In International conference on machine learning. PMLR, 1972–1982.
  58. Milking cowmask for semi-supervised image classification. arXiv preprint arXiv:2003.12022 (2020).
  59. GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification. Neurocomputing 321 (2018), 321–331.
  60. Virtual worlds as proxy for multi-object tracking analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4340–4349.
  61. Robusttad: Robust time series anomaly detection via decomposition and convolutional neural networks. arXiv preprint arXiv:2002.09545 (2020).
  62. Diffusion improves graph learning. Advances in neural information processing systems 32 (2019).
  63. Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2414–2423.
  64. Neural message passing for quantum chemistry. In International conference on machine learning. PMLR, 1263–1272.
  65. Generative adversarial nets. Advances in neural information processing systems 27 (2014).
  66. Selective text augmentation with word roles for low-resource text classification. arXiv preprint arXiv:2209.01560 (2022).
  67. Hongyu Guo. 2020. Nonlinear mixup: Out-of-manifold data augmentation for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4044–4051.
  68. Hongyu Guo and Yongyi Mao. 2021. ifMixup: Interpolating Graph Pair to Regularize Graph Classification. arXiv preprint arXiv:2110.09344 (2021).
  69. Augmenting data with mixup for sentence classification: An empirical study. arXiv preprint arXiv:1905.08941 (2019).
  70. Homophily-oriented Heterogeneous Graph Rewiring. In Proceedings of the ACM Web Conference 2023. 511–522.
  71. Performance of a deep neural network algorithm based on a small medical image dataset: incremental impact of 3D-to-2D reformation combined with novel data augmentation, photometric conversion, or transfer learning. Journal of digital imaging 33 (2020), 431–438.
  72. G-mixup: Graph data augmentation for graph classification. In International Conference on Machine Learning. PMLR, 8230–8248.
  73. Kaveh Hassani and Amir Hosein Khasahmadi. 2020. Contrastive multi-view representation learning on graphs. In International conference on machine learning. PMLR, 4116–4126.
  74. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
  75. Augmix: A simple data processing method to improve robustness and uncertainty. arXiv preprint arXiv:1912.02781 (2019).
  76. Population based augmentation: Efficient learning of augmentation policy schedules. In International conference on machine learning. PMLR, 2731–2741.
  77. Denoising diffusion probabilistic models. Advances in neural information processing systems 33 (2020), 6840–6851.
  78. Chris Hokamp and Qun Liu. 2017. Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1535–1546.
  79. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London. Series A: mathematical, physical and engineering sciences 454, 1971 (1998), 903–995.
  80. Combining Label Propagation and Simple Models out-performs Graph Neural Networks. In International Conference on Learning Representations.
  81. Improving Corruption Robustness with Random Erasing in the Frequency Domain. In 2023 International Conference on Electronics, Information, and Communication (ICEIC). IEEE, 1–3.
  82. Data Augmentation techniques in time series domain: a survey and taxonomy. Neural Computing and Applications 35, 14 (2023), 10123–10145.
  83. Hiroshi Inoue. 2018. Data augmentation by pairing samples for images classification. arXiv preprint arXiv:1801.02929 (2018).
  84. Label propagation for deep semi-supervised learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5070–5079.
  85. Brian Kenji Iwana and Seiichi Uchida. 2021. An empirical survey of data augmentation for time series classification with neural networks. Plos one 16, 7 (2021), e0254841.
  86. Spatial transformer networks. Advances in neural information processing systems 28 (2015).
  87. Navdeep Jaitly and Geoffrey E Hinton. 2013. Vocal tract length perturbation (VTLP) improves speech recognition. In Proc. ICML Workshop on Deep Learning for Audio, Speech and Language, Vol. 117. 21.
  88. Semi-supervised learning with graph learning-convolutional networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11313–11320.
  89. Sub-graph contrast for scalable self-supervised graph representation learning. In 2020 IEEE international conference on data mining (ICDM). IEEE, 222–231.
  90. Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 8018–8025.
  91. Multi-scale contrastive siamese networks for self-supervised graph representation learning. arXiv preprint arXiv:2105.05682 (2021).
  92. Graph structure learning for robust graph neural networks. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 66–74.
  93. Highly accurate protein structure prediction with AlphaFold. Nature 596, 7873 (2021), 583–589.
  94. Patchshuffle regularization. arXiv preprint arXiv:1707.07103 (2017).
  95. GRATIS: GeneRAting TIme Series with diverse and controllable characteristics. Statistical Analysis and Data Mining: The ASA Data Science Journal 13, 4 (2020), 354–376.
  96. AEDA: An Easier Data Augmentation Technique for Text Classification. In Findings of the Association for Computational Linguistics: EMNLP 2021. 2748–2754.
  97. MR image synthesis using generative adversarial networks for Parkinson’s disease classification. In Proceedings of International Conference on Artificial Intelligence and Applications: ICAIA 2020. Springer, 317–327.
  98. Zekarias T Kefato and Sarunas Girdzijauskas. 2021. Self-supervised graph neural networks without explicit negative sampling. arXiv preprint arXiv:2103.14958 (2021).
  99. On the stability of graph convolutional neural networks under edge rewiring. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8513–8517.
  100. A comprehensive survey of recent trends in deep learning for digital images augmentation. Artificial Intelligence Review (2022), 1–27.
  101. Feature engineering for predictive modeling using reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
  102. S-Mixup: Structural Mixup for Graph Neural Networks. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. 4003–4007.
  103. Co-mixup: Saliency guided joint mixup with supermodular diversity. arXiv preprint arXiv:2102.03065 (2021).
  104. Puzzle mix: Exploiting saliency and local statistics for optimal mixup. In International Conference on Machine Learning. PMLR, 5275–5285.
  105. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).
  106. Krishna Kumar Singh and Yong Jae Lee. 2017. Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In Proceedings of the IEEE International Conference on Computer Vision. 3524–3533.
  107. Deblurgan: Blind motion deblurring using conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8183–8192.
  108. Data augmentation for time series classification using convolutional neural networks. In ECML/PKDD workshop on advanced analytics and learning on temporal data.
  109. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.
  110. Smoothmix: a simple yet effective data augmentation to train robust classifiers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 756–757.
  111. Smart augmentation learning an optimal data augmentation strategy. Ieee Access 5 (2017), 5858–5869.
  112. A2-RL: Aesthetics aware reinforcement learning for image cropping. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8193–8201.
  113. A closed-form solution to photorealistic image stylization. In Proceedings of the European conference on computer vision (ECCV). 453–468.
  114. Fast autoaugment. Advances in Neural Information Processing Systems 32 (2019).
  115. Graph Mixup with Soft Alignments. arXiv preprint arXiv:2306.06788 (2023).
  116. Generative diffusion models on graphs: Methods and applications. arXiv preprint arXiv:2302.02591 (2023).
  117. Dagad: Data augmentation for graph anomaly detection. In 2022 IEEE International Conference on Data Mining (ICDM). IEEE, 259–268.
  118. Insertion, deletion, or substitution? Normalizing text messages without pre-categorization nor supervision. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 71–76.
  119. Graph rationalization with environment-based augmentations. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1069–1078.
  120. Automating feature subspace exploration via multi-agent reinforcement learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 207–215.
  121. Automated feature selection: A reinforcement learning perspective. IEEE Transactions on Knowledge and Data Engineering (2021).
  122. Interactive reinforced feature selection with traverse strategy. Knowledge and Information Systems 65, 5 (2023), 1935–1962.
  123. Efficient reinforced feature selection via early stopping traverse strategy. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 399–408.
  124. Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9031–9041.
  125. Local augmentation for graph neural networks. In International Conference on Machine Learning. PMLR, 14054–14072.
  126. GOGGLE: Generative modelling for tabular data by learning relational structure. In The Eleventh International Conference on Learning Representations.
  127. Towards unsupervised deep graph structure learning. In Proceedings of the ACM Web Conference 2022. 1392–1403.
  128. Automix: Unveiling the power of mixup for stronger classifiers. In European Conference on Computer Vision. Springer, 441–458.
  129. Smart rewiring for network robustness. Journal of Complex networks 1, 2 (2013), 150–159.
  130. Insnet: An efficient, flexible, and performant insertion-based text generation model. Advances in Neural Information Processing Systems 35 (2022), 7011–7023.
  131. Learning to drop: Robust graph neural network via topological denoising. In Proceedings of the 14th ACM international conference on web search and data mining. 779–787.
  132. Matthew Ma and Jinhong K Guo. 2004. Automatic image cropping for mobile device with built-in camera. In First IEEE Consumer Communications and Networking Conference, 2004. CCNC 2004. IEEE, 710–711.
  133. Met: Masked encoding for tabular data. arXiv preprint arXiv:2206.08564 (2022).
  134. Controlled Text Generation for Data Augmentation in Intelligent Artificial Agents. EMNLP-IJCNLP 2019 (2019), 90.
  135. Vukosi Marivate and Tshephisho Sefara. 2020. Improving short text classification through global augmentation methods. In Machine Learning and Knowledge Extraction: 4th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2020, Dublin, Ireland, August 25–28, 2020, Proceedings 4. Springer, 385–399.
  136. Scenenet rgb-d: Can 5m synthetic images beat generic imagenet pre-training on indoor segmentation?. In Proceedings of the IEEE International Conference on Computer Vision. 2678–2687.
  137. Agnieszka Mikołajczyk and Michał Grochowski. 2018. Data augmentation for improving deep learning in image classification problem. In 2018 international interdisciplinary PhD workshop (IIPhDW). IEEE, 117–122.
  138. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
  139. George A Miller. 1995. WordNet: a lexical database for English. Commun. ACM 38, 11 (1995), 39–41.
  140. Syntactic data augmentation increases robustness to inference heuristics. In 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020. Association for Computational Linguistics (ACL), 2339–2352.
  141. Alhassan Mumuni and Fuseini Mumuni. 2022. Data augmentation: A comprehensive survey of modern approaches. Array (2022), 100258.
  142. Data augmentation based on vowel stretch for improving children’s speech recognition. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 502–508.
  143. Data augmentation using empirical mode decomposition on neural networks to classify impact noise in vehicle. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 731–735.
  144. Data augmentation approaches for improving animal audio classification. Ecological Informatics 57 (2020), 101084.
  145. Sensation-based photo cropping. In Proceedings of the 17th ACM international conference on Multimedia. 669–672.
  146. Soma Onishi and Shoya Meguro. 2023. Rethinking Data Augmentation for Tabular Data in Deep Learning. arXiv preprint arXiv:2305.10308 (2023).
  147. R OpenAI. 2023. Gpt-4 technical report. arxiv 2303.08774. View in Article 2 (2023), 13.
  148. Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779 (2019).
  149. Graph transplant: Node saliency-guided graph mixup with local structure preservation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 7966–7974.
  150. Data synthesis based on generative adversarial networks. arXiv preprint arXiv:1806.03384 (2018).
  151. A global averaging method for dynamic time warping, with applications to clustering. Pattern recognition 44, 3 (2011), 678–693.
  152. Neural paraphrase generation with stacked residual LSTM networks. arXiv preprint arXiv:1610.03098 (2016).
  153. Gcc: Graph contrastive coding for graph neural network pre-training. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 1150–1160.
  154. Textual data augmentation for efficient active learning on tiny datasets. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 7400–7410.
  155. Improving language understanding by generative pre-training. (2018).
  156. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  157. Optimal small kernels for edge detection. In [1990] Proceedings. 10th International Conference on Pattern Recognition, Vol. 2. IEEE, 57–63.
  158. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  159. DropEdge: Towards Deep Graph Convolutional Networks on Node Classification. In International Conference on Learning Representations.
  160. Gözde Gül Şahin and Mark Steedman. 2018. Data Augmentation via Dependency Tree Morphing for Low-Resource Languages. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 5004–5009.
  161. Gaze-based interaction for semi-automatic photo cropping. In Proceedings of the SIGCHI conference on Human Factors in computing systems. 771–780.
  162. Rick Sauber-Cole and Taghi M Khoshgoftaar. 2022. The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey. Journal of Big Data 9, 1 (2022), 98.
  163. Mitigation of malicious attacks on networks. Proceedings of the National Academy of Sciences 108, 10 (2011), 3838–3841.
  164. ConvGeN: A convex space learning approach for deep-generative oversampling and imbalanced classification of small tabular datasets. Pattern Recognition (2023), 110138.
  165. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 86–96.
  166. Learning to clean: A GAN perspective. In Computer Vision–ACCV 2018 Workshops: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers 14. Springer, 174–185.
  167. Substructure Substitution: Structured Data Augmentation for NLP. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 3494–3508.
  168. Connor Shorten and Taghi M Khoshgoftaar. 2019. A survey on image data augmentation for deep learning. Journal of big data 6, 1 (2019), 1–48.
  169. Text data augmentation for deep learning. Journal of big Data 8 (2021), 1–34.
  170. Best practices for convolutional neural networks applied to visual document analysis.. In Icdar, Vol. 3. Edinburgh.
  171. Slawek Smyl and Karthik Kuber. 2016. Data preprocessing and augmentation for multiple short time series forecasting with recurrent neural networks. In 36th international symposium on forecasting.
  172. Saint: Improved neural networks for tabular data via row attention and contrastive pre-training. arXiv preprint arXiv:2106.01342 (2021).
  173. Topological regularization for graph neural networks augmentation. arXiv preprint arXiv:2104.02478 (2021).
  174. Fairdrop: Biased edge dropout for enhancing fairness in graph representation learning. IEEE Transactions on Artificial Intelligence 3, 3 (2021), 344–354.
  175. Odongo Steven Eyobu and Dong Seog Han. 2018. Feature representation and data augmentation for human activity classification based on wearable IMU sensor data using a deep LSTM neural network. Sensors 18, 9 (2018), 2892.
  176. Automated graph representation learning for node classification. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–7.
  177. MoCL: data-driven molecular fingerprint via knowledge-aware contrastive learning from molecular graph. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 3585–3594.
  178. Graph structure learning with variational information bottleneck. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 4165–4174.
  179. Sugar: Subgraph neural network with reinforcement pooling and self-supervised mutual information mechanism. In Proceedings of the Web Conference 2021. 2081–2091.
  180. Adversarial graph augmentation to improve graph contrastive learning. Advances in Neural Information Processing Systems 34 (2021), 15920–15933.
  181. Data augmentation using random image cropping and patching for deep CNNs. IEEE Transactions on Circuits and Systems for Video Technology 30, 9 (2019), 2917–2931.
  182. Large-Scale Representation Learning on Graphs via Bootstrapping. In International Conference on Learning Representations.
  183. Understanding over-squashing and bottlenecks on graphs via curvature. In International Conference on Learning Representations.
  184. Bayesian generative active deep learning. In International Conference on Machine Learning. PMLR, 6295–6304.
  185. Subtab: Subsetting features of tabular data for self-supervised representation learning. Advances in Neural Information Processing Systems 34 (2021), 18853–18865.
  186. Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks. In Proceedings of the 19th ACM international conference on multimodal interaction. 216–220.
  187. Attention is all you need. Advances in neural information processing systems 30 (2017).
  188. Manifold mixup: Better representations by interpolating hidden states. In International conference on machine learning. PMLR, 6438–6447.
  189. Graphmix: Improved training of gnns for semi-supervised learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 10024–10032.
  190. DiGress: Discrete Denoising diffusion for graph generation. In Proceedings of the 11th International Conference on Learning Representations.
  191. Attentive Cutmix: An Enhanced Data Augmentation Approach for Deep Learning Based Image Classification. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3642–3646.
  192. Group-wise reinforcement feature generation for optimal and explainable representation space reconstruction. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1826–1834.
  193. Reinforcement-Enhanced Autoregressive Feature Transformation: Gradient-steered Search in Continuous Space for Postfix Expressions. In Thirty-seventh Conference on Neural Information Processing Systems.
  194. Perspective transformation data augmentation for object detection. IEEE Access 8 (2019), 4935–4943.
  195. Graphcrop: Subgraph cropping for graph classification. arXiv preprint arXiv:2009.10564 (2020).
  196. Mixup for node and graph classification. In Proceedings of the Web Conference 2021. 3663–3674.
  197. Optimized scale-and-stretch for image resizing. In ACM SIGGRAPH Asia 2008 papers. 1–8.
  198. Zifeng Wang and Jimeng Sun. 2022. Transtab: Learning transferable tabular transformers across tables. Advances in Neural Information Processing Systems 35 (2022), 2902–2915.
  199. Jason Wei and Kai Zou. 2019. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 6382–6388.
  200. RobustSTL: A robust seasonal-trend decomposition algorithm for long time series. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 5409–5416.
  201. Time Series Data Augmentation for Deep Learning: A Survey. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization.
  202. Traceable Automatic Feature Transformation via Cascading Actor-Critic Agents. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM). SIAM, 775–783.
  203. Beyond Discrete Selection: Continuous Embedding Space Optimization for Generative Feature Selection. In 2023 IEEE International Conference on Data Mining (ICDM). IEEE Computer Society, 688–697.
  204. Data noising as smoothing in neural network language models. arXiv preprint arXiv:1703.02573 (2017).
  205. Modeling tabular data using conditional gan. Advances in neural information processing systems 32 (2019).
  206. Improved relation classification by deep recurrent neural networks with data augmentation. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 1461–1470.
  207. Learning the change for automatic image cropping. In Proceedings of the IEEE conference on computer vision and pattern recognition. 971–978.
  208. Graph adversarial self-supervised learning. Advances in Neural Information Processing Systems 34 (2021), 14887–14899.
  209. Generative Data Augmentation for Commonsense Reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2020. 1008–1025.
  210. Region-aware random erasing. In 2019 IEEE 19th International Conference on Communication Technology (ICCT). IEEE, 1699–1703.
  211. GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation. In Findings of the Association for Computational Linguistics: EMNLP 2021. 2225–2239.
  212. Time-series generative adversarial networks. Advances in neural information processing systems 32 (2019).
  213. Vime: Extending the success of self-and semi-supervised learning to tabular domain. Advances in Neural Information Processing Systems 33 (2020), 11033–11043.
  214. Graph contrastive learning with augmentations. Advances in neural information processing systems 33 (2020), 5812–5823.
  215. Hierarchical data augmentation and the application in text classification. IEEE Access 7 (2019), 185476–185485.
  216. Semi-supervised and self-supervised classification with multi-view graph neural networks. In Proceedings of the 30th ACM international conference on information & knowledge management. 2466–2476.
  217. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision. 6023–6032.
  218. Data-centric artificial intelligence: A survey. arXiv preprint arXiv:2303.10158 (2023).
  219. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017).
  220. Person Re-identification with pose variation aware data augmentation. Neural computing and applications 34, 14 (2022), 11817–11830.
  221. Probabilistic graphlet transfer for photo cropping. IEEE Transactions on Image Processing 22, 2 (2012), 802–815.
  222. A survey on graph diffusion models: Generative ai in science for molecule, protein and material. arXiv preprint arXiv:2304.01565 (2023).
  223. TFWT: Tabular Feature Weighting with Transformer. arXiv:2405.08403 [cs.LG]
  224. Heterogeneous graph structure learning for graph neural networks. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 4697–4705.
  225. Graph data augmentation for graph machine learning: A survey. arXiv preprint arXiv:2202.08871 (2022).
  226. Robust graph representation learning via neural sparsification. In International Conference on Machine Learning. PMLR, 11458–11468.
  227. Random erasing data augmentation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 13001–13008.
  228. Data Augmentation on Graphs: A Survey. arXiv preprint arXiv:2212.09970 (2022).
  229. Xianda Zhou and William Yang Wang. 2017. Mojitalk: Generating emotional responses at scale. arXiv preprint arXiv:1711.04090 (2017).
  230. Xiaojin Zhu and Zoubin Ghahramani. 2002. Learning from labeled and unlabeled data with label propagation. ProQuest Number: INFORMATION TO ALL USERS (2002).
  231. A survey on graph structure learning: Progress and opportunities. arXiv preprint arXiv:2103.03036 (2021).
Citations (6)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com