Large Language and Text-to-3D Models for Engineering Design Optimization (2307.01230v1)
Abstract: The current advances in generative AI for learning large neural network models with the capability to produce essays, images, music and even 3D assets from text prompts create opportunities for a manifold of disciplines. In the present paper, we study the potential of deep text-to-3D models in the engineering domain, with focus on the chances and challenges when integrating and interacting with 3D assets in computational simulation-based design optimization. In contrast to traditional design optimization of 3D geometries that often searches for the optimum designs using numerical representations, such as B-Spline surface or deformation parameters in vehicle aerodynamic optimization, natural language challenges the optimization framework by requiring a different interpretation of variation operators while at the same time may ease and motivate the human user interaction. Here, we propose and realize a fully automated evolutionary design optimization framework using Shap-E, a recently published text-to-3D asset network by OpenAI, in the context of aerodynamic vehicle optimization. For representing text prompts in the evolutionary optimization, we evaluate (a) a bag-of-words approach based on prompt templates and Wordnet samples, and (b) a tokenisation approach based on prompt templates and the byte pair encoding method from GPT4. Our main findings from the optimizations indicate that, first, it is important to ensure that the designs generated from prompts are within the object class of application, i.e. diverse and novel designs need to be realistic, and, second, that more research is required to develop methods where the strength of text prompt variations and the resulting variations of the 3D designs share causal relations to some degree to improve the optimization.
- R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, E. Brynjolfsson, S. Buch, D. Card, R. Castellon, N. Chatterji, A. Chen, K. Creel, J. Q. Davis, D. Demszky, C. Donahue, M. Doumbouya, E. Durmus, S. Ermon, J. Etchemendy, K. Ethayarajh, L. Fei-Fei, C. Finn, T. Gale, L. Gillespie, K. Goel, N. Goodman, S. Grossman, N. Guha, T. Hashimoto, P. Henderson, J. Hewitt, D. E. Ho, J. Hong, K. Hsu, J. Huang, T. Icard, S. Jain, D. Jurafsky, P. Kalluri, S. Karamcheti, G. Keeling, F. Khani, O. Khattab, P. W. Koh, M. Krass, R. Krishna, R. Kuditipudi, A. Kumar, F. Ladhak, M. Lee, T. Lee, J. Leskovec, I. Levent, X. L. Li, X. Li, T. Ma, A. Malik, C. D. Manning, S. Mirchandani, E. Mitchell, Z. Munyikwa, S. Nair, A. Narayan, D. Narayanan, B. Newman, A. Nie, J. C. Niebles, H. Nilforoshan, J. Nyarko, G. Ogut, L. Orr, I. Papadimitriou, J. S. Park, C. Piech, E. Portelance, C. Potts, A. Raghunathan, R. Reich, H. Ren, F. Rong, Y. Roohani, C. Ruiz, J. Ryan, C. Ré, D. Sadigh, S. Sagawa, K. Santhanam, A. Shih, K. Srinivasan, A. Tamkin, R. Taori, A. W. Thomas, F. Tramèr, R. E. Wang, W. Wang, B. Wu, J. Wu, Y. Wu, S. M. Xie, M. Yasunaga, J. You, M. Zaharia, M. Zhang, T. Zhang, X. Zhang, Y. Zhang, L. Zheng, K. Zhou, and P. Liang, “On the Opportunities and Risks of Foundation Models,” 2022. [Online]. Available: https://arxiv.org/abs/2108.07258
- A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Language Models are Unsupervised Multitask Learners,” OpenAI Blog, 2019. [Online]. Available: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
- R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-Resolution Image Synthesis with Latent Diffusion Models,” CoRR, vol. abs/2112.10752, 2021. [Online]. Available: https://arxiv.org/abs/2112.10752
- S. Menzel and B. Sendhoff, “Representing the Change - Free Form Deformation for Evolutionary Design Optimisation,” in Evolutionary Computation in Practice, T. Yu, D. Davis, C. Baydar, and R. Roy, Eds. Berlin: Springer, 2008, ch. 4, p. 63–86.
- M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst, “Geometric Deep Learning: Going beyond Euclidean Data,” IEEE Signal Processing Magazine, vol. 34, no. 4, pp. 18–42, 2017.
- P. Achlioptas, O. Diamanti, I. Mitliagkas, and L. Guibas, “Learning Representations and Generative Models for 3D Point Clouds,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. PMLR, 10–15 Jul 2018, pp. 40–49. [Online]. Available: http://proceedings.mlr.press/v80/achlioptas18a.html
- H. Jun and A. Nichol, “Shap-E: Generating Conditional 3D Implicit Functions,” 2023. [Online]. Available: https://arxiv.org/abs/2305.02463
- T. Rios, B. Van Stein, T. Bäck, B. Sendhoff, and S. Menzel, “Point2FFD: Learning Shape Representations of Simulation-Ready 3D Models for Engineering Design Optimization,” in 2021 International Conference on 3D Vision (3DV), 2021, pp. 1024–1033.
- N. Umetani, “Exploring Generative 3D Shapes Using Autoencoder Networks,” in SIGGRAPH Asia 2017 Technical Briefs, ser. SA ’17. New York, NY, USA: Association for Computing Machinery, 2017. [Online]. Available: https://doi.org/10.1145/3145749.3145758
- T. Rios, B. van Stein, P. Wollstadt, T. Bäck, B. Sendhoff, and S. Menzel, “Exploiting Local Geometric Features in Vehicle Design Optimization with 3D Point Cloud Autoencoders,” in 2021 IEEE Congress on Evolutionary Computation (CEC), 2021, pp. 514–521.
- T. Rios, B. van Stein, T. Bäck, B. Sendhoff, and S. Menzel, “Multitask Shape Optimization Using a 3-D Point Cloud Autoencoder as Unified Representation,” IEEE Transactions on Evolutionary Computation, vol. 26, no. 2, pp. 206–217, 2022.
- S. Saha, S. Menzel, L. L. Minku, X. Yao, B. Sendhoff, and P. Wollstadt, “Quantifying The Generative Capabilities Of Variational Autoencoders For 3D Car Point Clouds,” in 2020 IEEE Symposium Series on Computational Intelligence (SSCI), 2020, pp. 1469–1477.
- S. Saha, T. Rios, L. L. Minku, B. V. Stein, P. Wollstadt, X. Yao, T. Bäck, B. Sendhoff, and S. Menzel, “Exploiting Generative Models for Performance Predictions of 3D Car Designs,” in 2021 IEEE Symposium Series on Computational Intelligence (SSCI), 2021, pp. 1–9.
- A. Nichol, H. Jun, P. Dhariwal, P. Mishkin, and M. Chen, “Point-e: A system for generating 3d point clouds from complex prompts,” 2022.
- A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models,” in Proceedings of the 39th International Conference on Machine Learning, PMLR 162, 2022, pp. 16 784–16 804.
- A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-Shot Text-to-Image Generation,” in Proceedings of the 38th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. Meila and T. Zhang, Eds., vol. 139. PMLR, 18–24 Jul 2021, pp. 8821–8831. [Online]. Available: https://proceedings.mlr.press/v139/ramesh21a.html
- B. Poole, A. Jain, J. T. Barron, and B. Mildenhall, “DreamFusion: Text-to-3D using 2D Diffusion,” arXiv, 2022. [Online]. Available: https://arxiv.org/abs/2209.14988
- C.-H. Lin, J. Gao, L. Tang, T. Takikawa, X. Zeng, X. Huang, K. Kreis, S. Fidler, M.-Y. Liu, and T.-Y. Lin, “Magic3D: High-Resolution Text-to-3D Content Creation,” 2023. [Online]. Available: https://arxiv.org/abs/2211.10440
- J. Gao, T. Shen, Z. Wang, W. Chen, K. Yin, D. Li, O. Litany, Z. Gojcic, and S. Fidler, “GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images,” 2022. [Online]. Available: https://arxiv.org/abs/2209.11163
- M. A. Bautista, P. Guo, S. Abnar, W. Talbott, A. Toshev, Z. Chen, L. Dinh, S. Zhai, H. Goh, D. Ulbricht, A. Dehghan, and J. Susskind, “GAUDI: A Neural Architect for Immersive 3D Scene Generation,” 2022. [Online]. Available: https://arxiv.org/abs/2207.13751
- L. Reynolds and K. McDonnell, “Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm,” in CHI EA ’21: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, 2021, pp. 1–7.
- T. Martins, J. M. Cunha, J. Correia, and P. Machado, “Towards the evolution of prompts with metaprompter,” in Artificial Intelligence in Music, Sound, Art and Design, C. Johnson, N. Rodríguez-Fernández, and S. M. Rebelo, Eds. Cham: Springer Nature Switzerland, 2023, pp. 180–195.
- M. Wong, Y.-S. Ong, A. Gupta, K. K. Bali, and C. Chen, “Prompt Evolution for Generative AI: A Classifier-Guided Approach,” 2023. [Online]. Available: https://arxiv.org/abs/2305.16347
- N. Arechiga, F. Permenter, B. Song, and C. Yuan, “Drag-Guided Diffusion Models for Vehicle Image Generation,” 2023. [Online]. Available: https://arxiv.org/abs/2306.09935
- Y. Jin, M. Olhofer, and B. Sendhoff, “A framework for evolutionary optimization with approximate fitness functions,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 5, p. 481–494, 2002. [Online]. Available: documents/Jin_Tec02.pdf
- Z. Wu and M. Palmer, “Verb semantics and lexical selection,” in 32nd Annual Meeting of the Association for Computational Linguistics. Las Cruces, New Mexico, USA: Association for Computational Linguistics, Jun. 1994, pp. 133–138. [Online]. Available: https://aclanthology.org/P94-1019
- OpenAI, “GPT-4 Technical Report,” 2023. [Online]. Available: https://arxiv.org/abs/2303.08774
- N. Hansen and A. Ostermeier, “Completely derandomized self-adaptation in evolution strategies,” Evolutionary Computation, vol. 9, no. 2, pp. 159–195, 2001.
- T. Bäck and H. Schwefel, “Evolutionary Computation: An Overview,” in Proceedings of IEEE International Conference on Evolutionary Computation, 1996, pp. 20–29.
- P. Dubey, M. Y. Pramod, A. S. Kumar, and B. T. Kannan, “Numerical simulation of flow over a racing motorbike using OpenFOAM®,” AIP Conference Proceedings, vol. 2277, no. 1, 11 2020, 030029. [Online]. Available: https://doi.org/10.1063/5.0025199
- H. Fan, H. Su, and L. Guibas, “A point set generation network for 3d object reconstruction from a single image,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA, USA: IEEE Computer Society, jul 2017, pp. 2463–2471. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/CVPR.2017.264
- Thiago Rios (6 papers)
- Stefan Menzel (14 papers)
- Bernhard Sendhoff (4 papers)