Interactive Interior Design Recommendation via Coarse-to-fine Multimodal Reinforcement Learning (2310.07287v1)
Abstract: Personalized interior decoration design often incurs high labor costs. Recent efforts in developing intelligent interior design systems have focused on generating textual requirement-based decoration designs while neglecting the problem of how to mine homeowner's hidden preferences and choose the proper initial design. To fill this gap, we propose an Interactive Interior Design Recommendation System (IIDRS) based on reinforcement learning (RL). IIDRS aims to find an ideal plan by interacting with the user, who provides feedback on the gap between the recommended plan and their ideal one. To improve decision-making efficiency and effectiveness in large decoration spaces, we propose a Decoration Recommendation Coarse-to-Fine Policy Network (DecorRCFN). Additionally, to enhance generalization in online scenarios, we propose an object-aware feedback generation method that augments model training with diversified and dynamic textual feedback. Extensive experiments on a real-world dataset demonstrate our method outperforms traditional methods by a large margin in terms of recommendation accuracy. Further user studies demonstrate that our method reaches higher real-world user satisfaction than baseline methods.
- Towards knowledge-based recommender dialog system. arXiv preprint arXiv:1908.05391 (2019).
- ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity. arXiv preprint arXiv:2203.08101 (2022).
- Transvg: End-to-end visual grounding with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1769–1779.
- Unified conversational recommendation policy learning via graph-based reinforcement learning. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1431–1441.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- End-to-End Reinforcement Learning of Dialogue Agents for Information Access. CoRR abs/1609.00777 (2016). arXiv:1609.00777 http://arxiv.org/abs/1609.00777
- Xinhan Di and Pengqian Yu. 2021. Multi-Agent Reinforcement Learning of 3D Furniture Layout Simulation in Indoor Graphics Scenes. arXiv preprint arXiv:2102.09137 (2021).
- Vector quantized diffusion model for text-to-image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10696–10706.
- Context-Enhanced Stereo Transformer. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII. Springer, 263–279.
- Dialog-based interactive image retrieval. Advances in neural information processing systems 31 (2018).
- Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models. arXiv preprint arXiv:2303.11989 (2023).
- A comprehensive survey of deep learning for image captioning. ACM Computing Surveys (CsUR) 51, 6 (2019), 1–36.
- Modeling user preferences in recommender systems: A classification framework for explicit and implicit user feedback. ACM Transactions on Interactive Intelligent Systems (TiiS) 4, 2 (2014), 1–26.
- Investigating serendipity in recommender systems based on real user feedback. In Proceedings of the 33rd annual acm symposium on applied computing. 1341–1350.
- Estimation-action-reflection: Towards deep interaction between conversational and recommender systems. In Proceedings of the 13th International Conference on Web Search and Data Mining. 304–312.
- Interactive path reasoning on graph for conversational recommendation. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining. 2073–2083.
- Towards deep conversational recommendations. Advances in neural information processing systems 31 (2018).
- Interpretable multimodal retrieval for fashion products. In Proceedings of the 26th ACM international conference on Multimedia. 1571–1579.
- Composite correlation quantization for efficient multimodal retrieval. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 579–588.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748–8763.
- Zero-shot text-to-image generation. In International Conference on Machine Learning. PMLR, 8821–8831.
- Chatpainter: Improving text to image generation using dialogue. arXiv preprint arXiv:1802.08216 (2018).
- From show to tell: a survey on deep learning-based image captioning. IEEE transactions on pattern analysis and machine intelligence 45, 1 (2022), 539–559.
- Yueming Sun and Yi Zhang. 2018. Conversational recommender system. In The 41st international acm sigir conference on research & development in information retrieval. 235–244.
- Text2scene: Generating compositional scenes from textual descriptions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6710–6719.
- Hao Tan and Mohit Bansal. 2019. LXMERT: Learning Cross-Modality Encoder Representations from Transformers. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing.
- DiffuScene: Scene Graph Denoising Diffusion Probabilistic Model for Generative Indoor Scene Synthesis. arXiv preprint arXiv:2303.14207 (2023).
- Matthew Turk. 2014. Multimodal interaction: A review. Pattern recognition letters 36 (2014), 189–195.
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- Neuris: Neural reconstruction of indoor scenes using normal priors. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII. Springer, 139–155.
- Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. In International Conference on Machine Learning. PMLR, 23318–23340.
- Sceneformer: Indoor scene generation with transformers. In 2021 International Conference on 3D Vision (3DV). IEEE, 106–115.
- Lego-net: Learning regular rearrangements of objects in rooms. arXiv preprint arXiv:2301.09629 (2023).
- Multi-modality cross attention network for image and sentence matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10941–10950.
- Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforcement learning (1992), 5–32.
- Adapting user preference to online feedback in multi-round conversational recommendation. In Proceedings of the 14th ACM international conference on web search and data mining. 364–372.
- Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese. arXiv preprint arXiv:2211.01335 (2022).
- Yifei Yuan and Wai Lam. 2021. Conversational fashion image retrieval via multiturn natural language feedback. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 839–848.
- Reward Constrained Interactive Recommendation with Natural Language Feedback. arXiv preprint arXiv:2005.01618 (2020).
- Improving conversational recommender systems via knowledge graph based semantic fusion. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1006–1014.
- Towards topic-guided conversational recommender system. arXiv preprint arXiv:2010.04125 (2020).