GenQuery: Supporting Expressive Visual Search with Generative Models (2310.01287v2)
Abstract: Designers rely on visual search to explore and develop ideas in early design stages. However, designers can struggle to identify suitable text queries to initiate a search or to discover images for similarity-based search that can adequately express their intent. We propose GenQuery, a novel system that integrates generative models into the visual search process. GenQuery can automatically elaborate on users' queries and surface concrete search directions when users only have abstract ideas. To support precise expression of search intents, the system enables users to generatively modify images and use these in similarity-based search. In a comparative user study (N=16), designers felt that they could more accurately express their intents and find more satisfactory outcomes with GenQuery compared to a tool without generative features. Furthermore, the unpredictability of generations allowed participants to uncover more diverse outcomes. By supporting both convergence and divergence, GenQuery led to a more creative experience.
- Romain Beaumont. 2022. Clip Retrieval: Easily compute clip embeddings and build a clip retrieval system with them. https://github.com/rom1504/clip-retrieval.
- Nathalie Bonnardel. 1999. Creativity in design activities: the role of analogies in a constrained cognitive environment. In Creativity & Cognition. ACM New York, NY, USA, New York, NY, USA, 158–165. https://doi.org/10.1145/317561.317589
- Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models. , 14 pages. https://doi.org/10.1145/3586183.3606725
- VINS: Visual Search for Mobile User Interface Design. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 423, 14 pages. https://doi.org/10.1145/3411764.3445762
- Hernan Casakin and Gabriela Goldschmidt. 1999. Expertise and the use of visual analogy: implications for design education. Design Studies 20, 2 (1999), 153–175. https://doi.org/10.1016/S0142-694X(98)00032-5
- Erin Cherry and Celine Latulipe. 2014. Quantifying the Creativity Support of Digital Tools through the Creativity Support Index. ACM Trans. Comput.-Hum. Interact. 21, 4, Article 21 (jun 2014), 25 pages. https://doi.org/10.1145/2617588
- John Joon Young Chung and Eytan Adar. 2023a. Artinter: AI-Powered Boundary Objects for Commissioning Visual Arts. In Proceedings of the 2023 ACM Designing Interactive Systems Conference (Pittsburgh, PA, USA) (DIS ’23). Association for Computing Machinery, New York, NY, USA, 1997–2018. https://doi.org/10.1145/3563657.3595961
- John Joon Young Chung and Eytan Adar. 2023b. PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions. , Article 6 (2023), 17 pages. https://doi.org/10.1145/3586183.3606777
- TaleBrush: Sketching Stories with Generative Pretrained Language Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 209, 19 pages. https://doi.org/10.1145/3491102.3501819
- WorldSmith: Iterative and Expressive Prompting for World Building with a Generative AI. arXiv:2308.13355 [cs.HC] https://arxiv.org/abs/2308.13355
- Claudia Eckert and Martin Stacey. 2000. Sources of inspiration: a language of design. Design Studies 21, 5 (2000), 523–538. https://doi.org/10.1016/S0142-694X(00)00022-3
- CueFlik: interactive concept learning in image search. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Florence, Italy) (CHI ’08). Association for Computing Machinery, New York, NY, USA, 29–38. https://doi.org/10.1145/1357054.1357061
- How Digital Tools Impact Convergent and Divergent Thinking in Design Ideation. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 431, 11 pages. https://doi.org/10.1145/3411764.3445062
- Expressive Text-to-Image Generation with Rich Text. , 7511-7522 pages. https://doi.org/10.1109/ICCV51070.2023.00694
- PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models. arXiv:2303.17546 [cs.CV] https://arxiv.org/abs/2303.17546
- Gabriela Goldschmidt. 2016. Linkographic evidence for concurrent divergent and convergent thinking in creative design. Creativity research journal 28, 2 (2016), 115–122. https://doi.org/10.1080/10400419.2016.1162497
- Gabriela Goldschmidt and Maria Smolkov. 2006. Variances in the impact of visual stimuli on design problem solving performance. Design Studies 27, 5 (2006), 549–569. https://doi.org/10.1016/j.destud.2006.01.002
- Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Human mental workload 1 (1988), 139–183. https://doi.org/10.1016/s0166-4115(08)62386-9
- Getting inspired! understanding how and why examples are used in creative design practice. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, MA, USA) (CHI ’09). Association for Computing Machinery, New York, NY, USA, 87–96. https://doi.org/10.1145/1518701.1518717
- Prompt-to-Prompt Image Editing with Cross Attention Control. arXiv:2208.01626 [cs.CV] https://arxiv.org/abs/2208.01626
- Swire: Sketch-based User Interface Retrieval. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/3290605.3300334
- David G. Jansson and Steven M. Smith. 1991. Design fixation. Design Studies 12, 1 (1991), 3–11. https://doi.org/10.1016/0142-694X(91)90003-F
- PromptMaker: Prompt-based Prototyping with Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Article 35, 8 pages. https://doi.org/10.1145/3491101.3503564
- Paragon: An Online Gallery for Enhancing Design Feedback with Visual Examples. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3173574.3174180
- MetaMap: Supporting Visual Metaphor Ideation through Multi-dimensional Example-based Exploration. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 427, 15 pages. https://doi.org/10.1145/3411764.3445325
- Stylette: Styling the Web with Natural Language. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 5, 17 pages. https://doi.org/10.1145/3491102.3501931
- Cells, Generators, and Lenses: Design Framework for Object-Oriented Interaction with Large Language Models. , Article 4 (2023), 18 pages. https://doi.org/10.1145/3586183.3606833
- Segment Anything. ArXiv abs/2304.02643 (2023). https://arxiv.org/abs/2304.02643
- Large-Scale Text-to-Image Generation Models for Visual Artists’ Creative Works. In Proceedings of the 28th International Conference on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI ’23). Association for Computing Machinery, New York, NY, USA, 919–933. https://doi.org/10.1145/3581641.3584078
- SemanticCollage: Enriching Digital Mood Board Design with Semantic Labels. In Proceedings of the 2020 ACM Designing Interactive Systems Conference (Eindhoven, Netherlands) (DIS ’20). Association for Computing Machinery, New York, NY, USA, 407–418. https://doi.org/10.1145/3357236.3395494
- Context-Aware Asset Search for Graphic Design. IEEE Transactions on Visualization & Computer Graphics 25, 07 (jul 2019), 2419–2429. https://doi.org/10.1109/TVCG.2018.2842734
- WhittleSearch: Image search with relative attribute feedback. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 2973–2980. https://doi.org/10.1109/CVPR.2012.6248026
- DAPIE: Interactive Step-by-Step Explanatory Dialogues to Answer Children’s Why and How Questions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 450, 22 pages. https://doi.org/10.1145/3544548.3581369
- Vivian Liu and Lydia B Chilton. 2022. Design Guidelines for Prompt Engineering Text-to-Image Generative Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 384, 23 pages. https://doi.org/10.1145/3491102.3501825
- Opal: Multimodal Image Generation for News Illustration. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 73, 17 pages. https://doi.org/10.1145/3526113.3545621
- 3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows. , 23 pages. https://doi.org/10.1145/3563657.3596098
- Andrés Lucero. 2012. Framing, aligning, paradoxing, abstracting, and directing: how design mood boards work. In Proceedings of the Designing Interactive Systems Conference (Newcastle Upon Tyne, United Kingdom) (DIS ’12). Association for Computing Machinery, New York, NY, USA, 438–447. https://doi.org/10.1145/2317956.2318021
- Co-Writing Screenplays and Theatre Scripts with Language Models: Evaluation by Industry Professionals. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 355, 34 pages. https://doi.org/10.1145/3544548.3581225
- Null-text Inversion for Editing Real Images using Guided Diffusion Models. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 6038–6047. https://doi.org/10.1109/CVPR52729.2023.00585
- Inspiration, images and design : an investigation of designers’ information gathering strategies. Journal of Design Research 7, 4 (2008), 331–351. https://doi.org/10.1504/JDR.2008.026987
- GANSpiration: Balancing Targeted and Serendipitous Inspiration in User Interface Design with Style-Based Generative Adversarial Network. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 537, 15 pages. https://doi.org/10.1145/3491102.3517511
- AngleKindling: Supporting Journalistic Angle Ideation with Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 225, 16 pages. https://doi.org/10.1145/3544548.3580907
- d.tour: style-based exploration of design example galleries. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST ’11). Association for Computing Machinery, New York, NY, USA, 165–174. https://doi.org/10.1145/2047196.2047216
- Palette: Image-to-Image Diffusion Models. In ACM SIGGRAPH 2022 Conference Proceedings (Vancouver, BC, Canada) (SIGGRAPH ’22). Association for Computing Machinery, New York, NY, USA, Article 15, 10 pages. https://doi.org/10.1145/3528233.3530757
- StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 8500–8509. https://doi.org/10.1109/CVPR46437.2021.00840
- Collage Diffusion. arXiv:2303.00262 [cs.CV] https://arxiv.org/abs/2303.00262
- LAION-5B: An open large-scale dataset for training next generation image-text models. arXiv:2210.08402 [cs.CV] https://arxiv.org/abs/2210.08402
- kandinsky 2.2. https://github.com/ai-forever/Kandinsky-2?tab=readme-ov-file.
- Understanding knowledge management practices for early design activity and its implications for reuse. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, MA, USA) (CHI ’09). Association for Computing Machinery, New York, NY, USA, 2367–2376. https://doi.org/10.1145/1518701.1519064
- BIGexplore: Bayesian Information Gain Framework for Information Exploration. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 37, 16 pages. https://doi.org/10.1145/3491102.3517729
- ObjectStitch: Object Compositing with Diffusion Model. , 18310-18319 pages. https://doi.org/10.1109/CVPR52729.2023.01756
- Christoph Csallner Soumik Mohian. 2022. PSDoodle: Fast App Screen Search via Partial Screen Doodle. In 2022 IEEE/ACM 9th International Conference on Mobile Software Engineering and Systems (MobileSoft). IEEE Computer Society, Los Alamitos, CA, USA, 89–99. https://doi.org/10.1145/3524613.3527816
- Story Centaur: Large Language Model Few Shot Learning as a Creative Writing Tool. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, Dimitra Gkatzia and Djamé Seddah (Eds.). Association for Computational Linguistics, Online, 244–256. https://doi.org/10.18653/v1/2021.eacl-demos.29
- Hideyuki Tamura and Naokazu Yokoya. 1984. Image database systems: A survey. Pattern Recognition 17, 1 (1984), 29–43. https://doi.org/10.1016/0031-3203(84)90033-5 Knowledge Based Image Analysis.
- Viswanath Venkatesh and Hillol Bala. 2008. Technology acceptance model 3 and a research agenda on interventions. Decision Sciences 39, 2 (2008), 273–315. https://doi.org/10.1111/j.1540-5915.2008.00192.x
- Sketch-Guided Text-to-Image Diffusion Models. In ACM SIGGRAPH 2023 Conference Proceedings (Los Angeles, CA, USA) (SIGGRAPH ’23). Association for Computing Machinery, New York, NY, USA, Article 55, 11 pages. https://doi.org/10.1145/3588432.3591560
- PopBlends: Strategies for Conceptual Blending with Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 435, 19 pages. https://doi.org/10.1145/3544548.3580948
- RePrompt: Automatic Prompt Editing to Refine AI-Generative Art Towards Precise Expressions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 22, 29 pages. https://doi.org/10.1145/3544548.3581402
- AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 385, 22 pages. https://doi.org/10.1145/3491102.3517582
- Paint by Example: Exemplar-based Image Editing with Diffusion Models. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 18381–18391. https://doi.org/10.1109/CVPR52729.2023.01763
- A picture is worth a thousand keywords: image-based object search on a mobile platform. In CHI ’05 Extended Abstracts on Human Factors in Computing Systems (Portland, OR, USA) (CHI EA ’05). Association for Computing Machinery, New York, NY, USA, 2025–2028. https://doi.org/10.1145/1056808.1057083
- Wordcraft: Story Writing With Large Language Models. In 27th International Conference on Intelligent User Interfaces (Helsinki, Finland) (IUI ’22). Association for Computing Machinery, New York, NY, USA, 841–852. https://doi.org/10.1145/3490099.3511105
- Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval. ACM Trans. Multimedia Comput. Commun. Appl. 18, 2, Article 59 (mar 2022), 23 pages. https://doi.org/10.1145/3478642
- Adding Conditional Control to Text-to-Image Diffusion Models. (oct 2023), 3813–3824. https://doi.org/10.1109/ICCV51070.2023.00355