Papers
Topics
Authors
Recent
Search
2000 character limit reached

Text2Data: Low-Resource Data Generation with Textual Control

Published 8 Feb 2024 in cs.CL, cs.AI, and cs.LG | (2402.10941v2)

Abstract: Natural language serves as a common and straightforward signal for humans to interact seamlessly with machines. Recognizing the importance of this interface, the machine learning community is investing considerable effort in generating data that is semantically coherent with textual instructions. While strides have been made in text-to-data generation spanning image editing, audio synthesis, video creation, and beyond, low-resource areas characterized by expensive annotations or complex data structures, such as molecules, motion dynamics, and time series, often lack textual labels. This deficiency impedes supervised learning, thereby constraining the application of advanced generative models for text-to-data tasks. In response to these challenges in the low-resource scenario, we propose Text2Data, a novel approach that utilizes unlabeled data to understand the underlying data distribution through an unsupervised diffusion model. Subsequently, it undergoes controllable finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting. Comprehensive experiments demonstrate that Text2Data is able to achieve enhanced performance regarding controllability across various modalities, including molecules, motions and time series, when compared to existing baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Reconvat: A semi-supervised automatic music transcription framework for low-resource real-world data. In Proceedings of the 29th ACM International Conference on Multimedia, pp.  3918–3926, 2021.
  2. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  3. Multivariate time series forecasting via attention-based encoder–decoder framework. Neurocomputing, 388:269–279, 2020.
  4. E (n) equivariant normalizing flows. Advances in Neural Information Processing Systems, 34:4181–4192, 2021.
  5. Bi-objective trade-off with dynamic barrier gradient descent. Advances in Neural Information Processing Systems, 2021.
  6. Chatgpt is not all you need. a state of the art review of large generative ai models. arXiv preprint arXiv:2301.04655, 2023.
  7. Action2motion: Conditioned generation of 3d human motions. In Proceedings of the 28th ACM International Conference on Multimedia, pp.  2021–2029, 2020.
  8. Generating diverse and natural 3d human motions from text. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  5152–5161, June 2022a.
  9. Generating diverse and natural 3d human motions from text. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  5152–5161, 2022b.
  10. A survey on recent approaches for natural language processing in low-resource scenarios. arXiv preprint arXiv:2010.12309, 2020.
  11. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  12. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  13. Tight bounds for the expected risk of linear classifiers and pac-bayes finite-sample guarantees. In Artificial Intelligence and Statistics, pp.  384–392. PMLR, 2014.
  14. Equivariant diffusion for molecule generation in 3d. In International conference on machine learning, pp.  8867–8887. PMLR, 2022.
  15. Make it move: controllable image-to-video generation with text descriptions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  18219–18228, 2022.
  16. Prodiff: Progressive fast diffusion model for high-quality text-to-speech. In Proceedings of the 30th ACM International Conference on Multimedia, pp.  2595–2605, 2022.
  17. Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models. arXiv preprint arXiv:2301.12661, 2023.
  18. A review of deep transfer learning and recent advancements. Technologies, 11(2):40, 2023.
  19. Zinc: a free tool to discover chemistry for biology. Journal of chemical information and modeling, 52(7):1757–1768, 2012.
  20. The lj speech dataset. https://keithito.com/LJ-Speech-Dataset/, 2017.
  21. Talk-to-edit: Fine-grained facial editing via dialog. In Proceedings of International Conference on Computer Vision (ICCV), 2021.
  22. Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning, pp.  2323–2332. PMLR, 2018.
  23. Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  6007–6017, 2023.
  24. Guided-tts: A diffusion model for text-to-speech via classifier guidance. In International Conference on Machine Learning, pp.  11119–11133. PMLR, 2022.
  25. Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761, 2020.
  26. Multi-concept customization of text-to-image diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  1931–1941, 2023.
  27. Diffusion-sdf: Text-to-shape via voxelized diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  12642–12651, 2023.
  28. Video generation from text. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  29. Regular time-series generation using sgm. arXiv preprint arXiv:2301.08518, 2023.
  30. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  300–309, 2023.
  31. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp.  740–755. Springer, 2014.
  32. AudioLDM: Text-to-audio generation with latent diffusion models. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  21450–21474. PMLR, 23–29 Jul 2023.
  33. Glyphdraw: Learning to draw chinese characters in image synthesis models coherently. arXiv preprint arXiv:2303.17870, 2023.
  34. Amass: Archive of motion capture as surface shapes. In Proceedings of the International Conference on Computer vision (ICCV), pp.  5442–5451, 2019.
  35. Mixspeech: Data augmentation for low-resource automatic speech recognition. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  7008–7012. IEEE, 2021.
  36. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
  37. Mls: A large-scale multilingual dataset for speech research. ArXiv, abs/2012.03411, 2020.
  38. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020. URL http://jmlr.org/papers/v21/20-074.html.
  39. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data, 1(1):1–7, 2014.
  40. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In International Conference on Machine Learning, pp.  8857–8868. PMLR, 2021.
  41. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  22500–22510, 2023.
  42. Clip-forge: Towards zero-shot text-to-shape generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  18603–18613, 2022.
  43. Human motion diffusion model. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=SJ1kSyO2jwu.
  44. Deep neural network features and semi-supervised training for low resource speech recognition. In 2013 IEEE international conference on acoustics, speech and signal processing, pp.  6704–6708. IEEE, 2013.
  45. Exploring transfer learning for low resource emotional tts. In Intelligent Systems and Applications: Proceedings of the 2019 Intelligent Systems Conference (IntelliSys) Volume 1, pp.  52–60. Springer, 2020.
  46. End-to-end text-to-speech for low-resource languages by cross-lingual transfer learning. arXiv preprint arXiv:1904.06508, 2019.
  47. Mcvd-masked conditional video diffusion for prediction, generation, and interpolation. Advances in Neural Information Processing Systems, 35:23371–23385, 2022.
  48. Covost 2: A massively multilingual speech-to-text translation corpus, 2020.
  49. Controllable data generation by deep learning: A review. arXiv preprint arXiv:2207.09542, 2022.
  50. Diffsound: Discrete diffusion model for text-to-sound generation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023.
  51. Language-adversarial transfer learning for low-resource speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(3):621–630, 2018.
  52. Ttida: Controllable generative data augmentation via text-to-text and text-to-image models. arXiv preprint arXiv:2304.08821, 2023.
  53. Time-series generative adversarial networks. Advances in neural information processing systems, 32, 2019.
  54. A semi-supervised approach for low-resourced text generation. arXiv preprint arXiv:1906.00584, 2019.
  55. Adding conditional control to text-to-image diffusion models. Proceedings of the International Conference on Computer Vision (ICCV), 2023.
  56. Sine: Single image editing with text-to-image diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  6027–6037, 2023.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 5 tweets with 123 likes about this paper.