Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Counterfactual Generation with Identifiability Guarantees (2402.15309v1)

Published 23 Feb 2024 in cs.LG and cs.CL

Abstract: Counterfactual generation lies at the core of various machine learning tasks, including image translation and controllable text generation. This generation process usually requires the identification of the disentangled latent representations, such as content and style, that underlie the observed data. However, it becomes more challenging when faced with a scarcity of paired data and labeling information. Existing disentangled methods crucially rely on oversimplified assumptions, such as assuming independent content and style variables, to identify the latent variables, even though such assumptions may not hold for complex data distributions. For instance, food reviews tend to involve words like tasty, whereas movie reviews commonly contain words such as thrilling for the same positive sentiment. This problem is exacerbated when data are sampled from multiple domains since the dependence between content and style may vary significantly over domains. In this work, we tackle the domain-varying dependence between the content and the style variables inherent in the counterfactual generation task. We provide identification guarantees for such latent-variable models by leveraging the relative sparsity of the influences from different latent variables. Our theoretical insights enable the development of a doMain AdapTive counTerfactual gEneration model, called (MATTE). Our theoretically grounded framework achieves state-of-the-art performance in unsupervised style transfer tasks, where neither paired data nor style labels are utilized, across four large-scale datasets. Code is available at https://github.com/hanqi-qi/Matte.git

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. An information-maximization approach to blind separation and blind deconvolution. Neural computation, 7(6):1129–1159, 1995.
  2. DoCoGen: Domain counterfactual generation for low resource domain adaptation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7727–7746, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.533. URL https://aclanthology.org/2022.acl-long.533.
  3. Counterfactuals and causability in explainable artificial intelligence: Theory, algorithms, and applications. Information Fusion, 81:59–83, 2022.
  4. P. Comon. Independent component analysis, a new concept? Signal processing, 36(3):287–314, 1994.
  5. Style transformer: Unpaired text style transfer without disentangled latent representation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5997–6007, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1601. URL https://aclanthology.org/P19-1601.
  6. Plug and play language models: A simple approach to controlled text generation. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=H1edEyBKDS.
  7. Compression, transduction, and creation: A unified framework for evaluating natural language generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7580–7605, Online and Punta Cana, Dominican Republic, Nov. 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.599. URL https://aclanthology.org/2021.emnlp-main.599.
  8. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
  9. Jointly modeling aspects, ratings and sentiments for movie recommendation (jmars). In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 193–202, 2014.
  10. Invertible generative modeling using linear rational splines. In S. Chiappa and R. Calandra, editors, The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26-28 August 2020, Online [Palermo, Sicily, Italy], volume 108 of Proceedings of Machine Learning Research, pages 4236–4246. PMLR, 2020. URL http://proceedings.mlr.press/v108/dolatabadi20a.html.
  11. Neural spline flows. Advances in neural information processing systems, 32, 2019.
  12. Cf-vae: Causal disentangled representation learning with vae and causal flows. arXiv preprint arXiv:2304.09010, 2023.
  13. The incomplete rosetta stone problem: Identifiability results for multi-view nonlinear ica, 2019.
  14. H. Hälvä and A. Hyvarinen. Hidden markov nonlinear ica: Unsupervised learning from nonstationary time series. In Conference on Uncertainty in Artificial Intelligence, pages 939–948. PMLR, 2020.
  15. A probabilistic formulation of unsupervised text style transfer. CoRR, abs/2002.03912, 2020. URL https://arxiv.org/abs/2002.03912.
  16. beta-VAE: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=Sy2fzU9gl.
  17. Toward controlled generation of text. In International conference on machine learning, pages 1587–1596. PMLR, 2017.
  18. Neural autoregressive flows. In International Conference on Machine Learning, pages 2078–2087. PMLR, 2018.
  19. A. Hyvärinen and P. Pajunen. Nonlinear independent component analysis: Existence and uniqueness results. Neural networks, 12(3):429–439, 1999.
  20. Nonlinear ica using auxiliary variables and generalized contrastive learning. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 859–868. PMLR, 2019.
  21. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  22. Disentangled representation learning for non-parallel text style transfer. In A. Korhonen, D. R. Traum, and L. Màrquez, editors, Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages 424–434. Association for Computational Linguistics, 2019. doi: 10.18653/v1/p19-1041. URL https://doi.org/10.18653/v1/p19-1041.
  23. CTRL: A conditional transformer language model for controllable generation. CoRR, abs/1909.05858, 2019. URL http://arxiv.org/abs/1909.05858.
  24. Variational autoencoders and nonlinear ica: A unifying framework. In International Conference on Artificial Intelligence and Statistics, pages 2207–2217. PMLR, 2020.
  25. Partial disentanglement for domain adaptation. In International Conference on Machine Learning, pages 11455–11472. PMLR, 2022.
  26. Identification of nonlinear latent hierarchical models, 2023a.
  27. Understanding masked autoencoders via hierarchical latent variable models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7918–7928, June 2023b.
  28. Variational inference of disentangled latent concepts from unlabeled observations. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. URL https://openreview.net/forum?id=H1kG7GZAW.
  29. Disentanglement via mechanism sparsity regularization: A new principle for nonlinear ica. In Conference on Causal Learning and Reasoning, pages 428–484. PMLR, 2022.
  30. Multiple-attribute text rewriting. In International Conference on Learning Representations, 2019.
  31. Optimus: Organizing sentences via pre-trained modeling of a latent space. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4678–4699, Online, Nov. 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.378. URL https://aclanthology.org/2020.emnlp-main.378.
  32. Domain adaptive text style transfer. In K. Inui, J. Jiang, V. Ng, and X. Wan, editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 3302–3311. Association for Computational Linguistics, 2019. doi: 10.18653/v1/D19-1325. URL https://doi.org/10.18653/v1/D19-1325.
  33. A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 110–119, San Diego, California, June 2016. Association for Computational Linguistics. doi: 10.18653/v1/N16-1014. URL https://aclanthology.org/N16-1014.
  34. Delete, retrieve, generate: a simple approach to sentiment and style transfer. In M. A. Walker, H. Ji, and A. Stent, editors, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers), pages 1865–1874. Association for Computational Linguistics, 2018. doi: 10.18653/v1/n18-1169. URL https://doi.org/10.18653/v1/n18-1169.
  35. Low resource style transfer via domain adaptive meta learning. In M. Carpuat, M. de Marneffe, and I. V. M. Ruíz, editors, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pages 3014–3026. Association for Computational Linguistics, 2022. doi: 10.18653/v1/2022.naacl-main.220. URL https://doi.org/10.18653/v1/2022.naacl-main.220.
  36. Composable text controls in latent space with odes, 2022.
  37. Weakly-supervised disentanglement without compromises. In International Conference on Machine Learning, pages 6348–6359. PMLR, 2020.
  38. Disentangling disentanglement in variational autoencoders. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 4402–4412. PMLR, 2019. URL http://proceedings.mlr.press/v97/mathieu19a.html.
  39. An identifiable double vae for disentangled representations. In M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 7769–7779. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/mita21a.html.
  40. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA, USA, pages 311–318. ACL, 2002. doi: 10.3115/1073083.1073135. URL https://aclanthology.org/P02-1040/.
  41. Language models are unsupervised multitask learners. 2019.
  42. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  43. S. Rao and J. R. Tetreault. Dear sir or madam, may i introduce the gyafc dataset: Corpus, benchmarks and metrics for formality style transfer. In North American Chapter of the Association for Computational Linguistics, 2018.
  44. Textsettr: Label-free text style extraction and tunable targeted restyling. CoRR, abs/2010.03802, 2020. URL https://arxiv.org/abs/2010.03802.
  45. Tailor: Generating and perturbing text with semantic controls. arXiv preprint arXiv:2107.07150, 2021.
  46. Semi-supervised text style transfer: Cross projection in latent space. In Conference on Empirical Methods in Natural Language Processing, 2019.
  47. Style transfer from non-parallel text by cross-alignment. CoRR, abs/1705.09655, 2017. URL http://arxiv.org/abs/1705.09655.
  48. Disentanglement by nonlinear ica with general incompressible-flow networks (gin). arXiv preprint arXiv:2001.04872, 2020.
  49. "transforming" delete, retrieve, generate approach for controlled text style transfer. In K. Inui, J. Jiang, V. Ng, and X. Wan, editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 3267–3277. Association for Computational Linguistics, 2019. doi: 10.18653/v1/D19-1322. URL https://doi.org/10.18653/v1/D19-1322.
  50. Self-supervised learning with data augmentations provably isolates content from style. arXiv preprint arXiv:2106.04619, 2021.
  51. Controllable gradient item retrieval. In Proceedings of the Web Conference 2021, pages 768–777, 2021.
  52. Controllable unsupervised text attribute transfer via editing entangled latent representation. Advances in Neural Information Processing Systems, 32, 2019a.
  53. Harnessing pre-trained neural networks with rules for formality style transfer. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3573–3578, 2019b.
  54. Multi-domain image generation and translation with identifiability guarantees. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=U2g8OGONA_V.
  55. On variational learning of controllable representations for text without supervision. In International Conference on Machine Learning, 2019a.
  56. On variational learning of controllable representations for text without supervision. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 10534–10543. PMLR, 2020. URL http://proceedings.mlr.press/v119/xu20a.html.
  57. Formality style transfer with hybrid textual annotations. ArXiv, abs/1903.06353, 2019b.
  58. K. Yang and D. Klein. FUDGE: controlled text generation with future discriminators. In K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tür, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty, and Y. Zhou, editors, Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, pages 3511–3535. Association for Computational Linguistics, 2021. doi: 10.18653/v1/2021.naacl-main.276. URL https://doi.org/10.18653/v1/2021.naacl-main.276.
  59. Nonlinear ICA using volume-preserving transformations. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=AMpki9kp8Cn.
  60. Unsupervised text style transfer using language models as discriminators. Advances in Neural Information Processing Systems, 31, 2018.
  61. Character-level convolutional networks for text classification. Advances in neural information processing systems, 28, 2015.
  62. On the identifiability of nonlinear ICA: Sparsity and beyond. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=Wo1HF2wWNZb.
  63. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  64. Contrastive learning inverts the data generating process, 2022.
Citations (3)

Summary

We haven't generated a summary for this paper yet.