Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GenQuery: Supporting Expressive Visual Search with Generative Models (2310.01287v2)

Published 2 Oct 2023 in cs.HC

Abstract: Designers rely on visual search to explore and develop ideas in early design stages. However, designers can struggle to identify suitable text queries to initiate a search or to discover images for similarity-based search that can adequately express their intent. We propose GenQuery, a novel system that integrates generative models into the visual search process. GenQuery can automatically elaborate on users' queries and surface concrete search directions when users only have abstract ideas. To support precise expression of search intents, the system enables users to generatively modify images and use these in similarity-based search. In a comparative user study (N=16), designers felt that they could more accurately express their intents and find more satisfactory outcomes with GenQuery compared to a tool without generative features. Furthermore, the unpredictability of generations allowed participants to uncover more diverse outcomes. By supporting both convergence and divergence, GenQuery led to a more creative experience.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Romain Beaumont. 2022. Clip Retrieval: Easily compute clip embeddings and build a clip retrieval system with them. https://github.com/rom1504/clip-retrieval.
  2. Nathalie Bonnardel. 1999. Creativity in design activities: the role of analogies in a constrained cognitive environment. In Creativity & Cognition. ACM New York, NY, USA, New York, NY, USA, 158–165. https://doi.org/10.1145/317561.317589
  3. Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models. , 14 pages. https://doi.org/10.1145/3586183.3606725
  4. VINS: Visual Search for Mobile User Interface Design. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 423, 14 pages. https://doi.org/10.1145/3411764.3445762
  5. Hernan Casakin and Gabriela Goldschmidt. 1999. Expertise and the use of visual analogy: implications for design education. Design Studies 20, 2 (1999), 153–175. https://doi.org/10.1016/S0142-694X(98)00032-5
  6. Erin Cherry and Celine Latulipe. 2014. Quantifying the Creativity Support of Digital Tools through the Creativity Support Index. ACM Trans. Comput.-Hum. Interact. 21, 4, Article 21 (jun 2014), 25 pages. https://doi.org/10.1145/2617588
  7. John Joon Young Chung and Eytan Adar. 2023a. Artinter: AI-Powered Boundary Objects for Commissioning Visual Arts. In Proceedings of the 2023 ACM Designing Interactive Systems Conference (Pittsburgh, PA, USA) (DIS ’23). Association for Computing Machinery, New York, NY, USA, 1997–2018. https://doi.org/10.1145/3563657.3595961
  8. John Joon Young Chung and Eytan Adar. 2023b. PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions. , Article 6 (2023), 17 pages. https://doi.org/10.1145/3586183.3606777
  9. TaleBrush: Sketching Stories with Generative Pretrained Language Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 209, 19 pages. https://doi.org/10.1145/3491102.3501819
  10. WorldSmith: Iterative and Expressive Prompting for World Building with a Generative AI. arXiv:2308.13355 [cs.HC] https://arxiv.org/abs/2308.13355
  11. Claudia Eckert and Martin Stacey. 2000. Sources of inspiration: a language of design. Design Studies 21, 5 (2000), 523–538. https://doi.org/10.1016/S0142-694X(00)00022-3
  12. CueFlik: interactive concept learning in image search. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Florence, Italy) (CHI ’08). Association for Computing Machinery, New York, NY, USA, 29–38. https://doi.org/10.1145/1357054.1357061
  13. How Digital Tools Impact Convergent and Divergent Thinking in Design Ideation. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 431, 11 pages. https://doi.org/10.1145/3411764.3445062
  14. Expressive Text-to-Image Generation with Rich Text. , 7511-7522 pages. https://doi.org/10.1109/ICCV51070.2023.00694
  15. PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models. arXiv:2303.17546 [cs.CV] https://arxiv.org/abs/2303.17546
  16. Gabriela Goldschmidt. 2016. Linkographic evidence for concurrent divergent and convergent thinking in creative design. Creativity research journal 28, 2 (2016), 115–122. https://doi.org/10.1080/10400419.2016.1162497
  17. Gabriela Goldschmidt and Maria Smolkov. 2006. Variances in the impact of visual stimuli on design problem solving performance. Design Studies 27, 5 (2006), 549–569. https://doi.org/10.1016/j.destud.2006.01.002
  18. Sandra G. Hart and Lowell E. Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. Human mental workload 1 (1988), 139–183. https://doi.org/10.1016/s0166-4115(08)62386-9
  19. Getting inspired! understanding how and why examples are used in creative design practice. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, MA, USA) (CHI ’09). Association for Computing Machinery, New York, NY, USA, 87–96. https://doi.org/10.1145/1518701.1518717
  20. Prompt-to-Prompt Image Editing with Cross Attention Control. arXiv:2208.01626 [cs.CV] https://arxiv.org/abs/2208.01626
  21. Swire: Sketch-based User Interface Retrieval. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–10. https://doi.org/10.1145/3290605.3300334
  22. David G. Jansson and Steven M. Smith. 1991. Design fixation. Design Studies 12, 1 (1991), 3–11. https://doi.org/10.1016/0142-694X(91)90003-F
  23. PromptMaker: Prompt-based Prototyping with Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Article 35, 8 pages. https://doi.org/10.1145/3491101.3503564
  24. Paragon: An Online Gallery for Enhancing Design Feedback with Visual Examples. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3173574.3174180
  25. MetaMap: Supporting Visual Metaphor Ideation through Multi-dimensional Example-based Exploration. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 427, 15 pages. https://doi.org/10.1145/3411764.3445325
  26. Stylette: Styling the Web with Natural Language. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 5, 17 pages. https://doi.org/10.1145/3491102.3501931
  27. Cells, Generators, and Lenses: Design Framework for Object-Oriented Interaction with Large Language Models. , Article 4 (2023), 18 pages. https://doi.org/10.1145/3586183.3606833
  28. Segment Anything. ArXiv abs/2304.02643 (2023). https://arxiv.org/abs/2304.02643
  29. Large-Scale Text-to-Image Generation Models for Visual Artists’ Creative Works. In Proceedings of the 28th International Conference on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI ’23). Association for Computing Machinery, New York, NY, USA, 919–933. https://doi.org/10.1145/3581641.3584078
  30. SemanticCollage: Enriching Digital Mood Board Design with Semantic Labels. In Proceedings of the 2020 ACM Designing Interactive Systems Conference (Eindhoven, Netherlands) (DIS ’20). Association for Computing Machinery, New York, NY, USA, 407–418. https://doi.org/10.1145/3357236.3395494
  31. Context-Aware Asset Search for Graphic Design. IEEE Transactions on Visualization & Computer Graphics 25, 07 (jul 2019), 2419–2429. https://doi.org/10.1109/TVCG.2018.2842734
  32. WhittleSearch: Image search with relative attribute feedback. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 2973–2980. https://doi.org/10.1109/CVPR.2012.6248026
  33. DAPIE: Interactive Step-by-Step Explanatory Dialogues to Answer Children’s Why and How Questions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 450, 22 pages. https://doi.org/10.1145/3544548.3581369
  34. Vivian Liu and Lydia B Chilton. 2022. Design Guidelines for Prompt Engineering Text-to-Image Generative Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 384, 23 pages. https://doi.org/10.1145/3491102.3501825
  35. Opal: Multimodal Image Generation for News Illustration. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 73, 17 pages. https://doi.org/10.1145/3526113.3545621
  36. 3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows. , 23 pages. https://doi.org/10.1145/3563657.3596098
  37. Andrés Lucero. 2012. Framing, aligning, paradoxing, abstracting, and directing: how design mood boards work. In Proceedings of the Designing Interactive Systems Conference (Newcastle Upon Tyne, United Kingdom) (DIS ’12). Association for Computing Machinery, New York, NY, USA, 438–447. https://doi.org/10.1145/2317956.2318021
  38. Co-Writing Screenplays and Theatre Scripts with Language Models: Evaluation by Industry Professionals. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 355, 34 pages. https://doi.org/10.1145/3544548.3581225
  39. Null-text Inversion for Editing Real Images using Guided Diffusion Models. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 6038–6047. https://doi.org/10.1109/CVPR52729.2023.00585
  40. Inspiration, images and design : an investigation of designers’ information gathering strategies. Journal of Design Research 7, 4 (2008), 331–351. https://doi.org/10.1504/JDR.2008.026987
  41. GANSpiration: Balancing Targeted and Serendipitous Inspiration in User Interface Design with Style-Based Generative Adversarial Network. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 537, 15 pages. https://doi.org/10.1145/3491102.3517511
  42. AngleKindling: Supporting Journalistic Angle Ideation with Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 225, 16 pages. https://doi.org/10.1145/3544548.3580907
  43. d.tour: style-based exploration of design example galleries. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST ’11). Association for Computing Machinery, New York, NY, USA, 165–174. https://doi.org/10.1145/2047196.2047216
  44. Palette: Image-to-Image Diffusion Models. In ACM SIGGRAPH 2022 Conference Proceedings (Vancouver, BC, Canada) (SIGGRAPH ’22). Association for Computing Machinery, New York, NY, USA, Article 15, 10 pages. https://doi.org/10.1145/3528233.3530757
  45. StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 8500–8509. https://doi.org/10.1109/CVPR46437.2021.00840
  46. Collage Diffusion. arXiv:2303.00262 [cs.CV] https://arxiv.org/abs/2303.00262
  47. LAION-5B: An open large-scale dataset for training next generation image-text models. arXiv:2210.08402 [cs.CV] https://arxiv.org/abs/2210.08402
  48. kandinsky 2.2. https://github.com/ai-forever/Kandinsky-2?tab=readme-ov-file.
  49. Understanding knowledge management practices for early design activity and its implications for reuse. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Boston, MA, USA) (CHI ’09). Association for Computing Machinery, New York, NY, USA, 2367–2376. https://doi.org/10.1145/1518701.1519064
  50. BIGexplore: Bayesian Information Gain Framework for Information Exploration. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 37, 16 pages. https://doi.org/10.1145/3491102.3517729
  51. ObjectStitch: Object Compositing with Diffusion Model. , 18310-18319 pages. https://doi.org/10.1109/CVPR52729.2023.01756
  52. Christoph Csallner Soumik Mohian. 2022. PSDoodle: Fast App Screen Search via Partial Screen Doodle. In 2022 IEEE/ACM 9th International Conference on Mobile Software Engineering and Systems (MobileSoft). IEEE Computer Society, Los Alamitos, CA, USA, 89–99. https://doi.org/10.1145/3524613.3527816
  53. Story Centaur: Large Language Model Few Shot Learning as a Creative Writing Tool. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, Dimitra Gkatzia and Djamé Seddah (Eds.). Association for Computational Linguistics, Online, 244–256. https://doi.org/10.18653/v1/2021.eacl-demos.29
  54. Hideyuki Tamura and Naokazu Yokoya. 1984. Image database systems: A survey. Pattern Recognition 17, 1 (1984), 29–43. https://doi.org/10.1016/0031-3203(84)90033-5 Knowledge Based Image Analysis.
  55. Viswanath Venkatesh and Hillol Bala. 2008. Technology acceptance model 3 and a research agenda on interventions. Decision Sciences 39, 2 (2008), 273–315. https://doi.org/10.1111/j.1540-5915.2008.00192.x
  56. Sketch-Guided Text-to-Image Diffusion Models. In ACM SIGGRAPH 2023 Conference Proceedings (Los Angeles, CA, USA) (SIGGRAPH ’23). Association for Computing Machinery, New York, NY, USA, Article 55, 11 pages. https://doi.org/10.1145/3588432.3591560
  57. PopBlends: Strategies for Conceptual Blending with Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 435, 19 pages. https://doi.org/10.1145/3544548.3580948
  58. RePrompt: Automatic Prompt Editing to Refine AI-Generative Art Towards Precise Expressions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 22, 29 pages. https://doi.org/10.1145/3544548.3581402
  59. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 385, 22 pages. https://doi.org/10.1145/3491102.3517582
  60. Paint by Example: Exemplar-based Image Editing with Diffusion Models. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 18381–18391. https://doi.org/10.1109/CVPR52729.2023.01763
  61. A picture is worth a thousand keywords: image-based object search on a mobile platform. In CHI ’05 Extended Abstracts on Human Factors in Computing Systems (Portland, OR, USA) (CHI EA ’05). Association for Computing Machinery, New York, NY, USA, 2025–2028. https://doi.org/10.1145/1056808.1057083
  62. Wordcraft: Story Writing With Large Language Models. In 27th International Conference on Intelligent User Interfaces (Helsinki, Finland) (IUI ’22). Association for Computing Machinery, New York, NY, USA, 841–852. https://doi.org/10.1145/3490099.3511105
  63. Tell, Imagine, and Search: End-to-end Learning for Composing Text and Image to Image Retrieval. ACM Trans. Multimedia Comput. Commun. Appl. 18, 2, Article 59 (mar 2022), 23 pages. https://doi.org/10.1145/3478642
  64. Adding Conditional Control to Text-to-Image Diffusion Models. (oct 2023), 3813–3824. https://doi.org/10.1109/ICCV51070.2023.00355
Citations (6)

Summary

We haven't generated a summary for this paper yet.