Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

mForms : Multimodal Form-Filling with Question Answering (2011.12340v4)

Published 24 Nov 2020 in cs.AI

Abstract: This paper presents a new approach to form-filling by reformulating the task as multimodal natural language Question Answering (QA). The reformulation is achieved by first translating the elements on the GUI form (text fields, buttons, icons, etc.) to natural language questions, where these questions capture the element's multimodal semantics. After a match is determined between the form element (Question) and the user utterance (Answer), the form element is filled through a pre-trained extractive QA system. By leveraging pre-trained QA models and not requiring form-specific training, this approach to form-filling is zero-shot. The paper also presents an approach to further refine the form-filling by using multi-task training to incorporate a potentially large number of successive tasks. Finally, the paper introduces a multimodal natural language form-filling dataset Multimodal Forms (mForms), as well as a multimodal extension of the popular ATIS dataset to support future research and experimentation. Results show the new approach not only maintains robust accuracy for sparse training conditions but achieves state-of-the-art F1 of 0.97 on ATIS with approximately 1/10th of the training data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Sequential dialogue context modeling for spoken language understanding. arXiv preprint arXiv:1705.03455 .
  2. Towards zero-shot frame semantic parsing for domain scaling. arXiv preprint arXiv:1707.02363 .
  3. Zero-Shot Transfer Learning with Synthesized Data for Multi-Domain Dialogue State Tracking. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
  4. BERT for Joint Intent Classification and Slot Filling. arXiv:1902.10909 .
  5. Zero-shot learning for semantic utterance classification. arXiv preprint arXiv:1401.0509 .
  6. Rico: A Mobile App Dataset for Building Data-Driven Design Applications. In Proceedings of the 30th Annual Symposium on User Interface Software and Technology, UIST ’17.
  7. QA-Driven Zero-shot Slot Filling with Weak Supervision Pretraining. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 654–664. Online: Association for Computational Linguistics. doi:10.18653/v1/2021.acl-short.83. URL https://aclanthology.org/2021.acl-short.83.
  8. Extending domain coverage of language understanding systems via intent transfer between domains using knowledge graphs and search query click logs. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4067–4071. IEEE.
  9. Improved and Efficient Conversational Slot Labeling through Question Answering. arXiv preprint arXiv:2204.02123 .
  10. Eye gaze for spoken language understanding in multi-modal conversational interactions. In Proceedings of the 16th International Conference on Multimodal Interaction, 263–266.
  11. Multi-Domain Joint Semantic Frame Parsing using Bi-directional RNN-LSTM. In Proceedings of The 17th Annual Meeting of the International Speech Communication Association (INTERSPEECH 2016). ISCA.
  12. Multi-Modal Conversational Search and Browse. In Proceedings of the First Workshop on Speech, Language and Audio in Multimedia (SLAM 2013), 96–101.
  13. Zero-Shot Visual Slot Filling as Question Answering. CoRR abs/2011.12340. URL https://arxiv.org/abs/2011.12340.
  14. Domain adaptation of recurrent neural networks for natural language understanding. arXiv:1604.00117 .
  15. Federated control with hierarchical multi-agent deep reinforcement learning. In Conference on Neural Information Processing Systems (NeurIPS), Hierarchical Reinforcement Learning Workshop.
  16. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. arXiv:1909.11942 .
  17. Zero-Shot Relation Extraction via Reading Comprehension. CoRR abs/1706.04115. URL http://arxiv.org/abs/1706.04115.
  18. RILOD: near real-time incremental learning for object detection at the edge. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing, 113–126.
  19. Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling.
  20. End-to-end optimization of task-oriented dialogue model with deep reinforcement learning. arXiv preprint arXiv:1711.10712 .
  21. Learning Design Semantics for Mobile Apps. In The 31st Annual ACM Symposium on User Interface Software and Technology, UIST ’18, 569–579.
  22. Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23(3): 530–539.
  23. Know What You Don’t Know: Unanswerable Questions for SQuAD 784–789. doi:10.18653/v1/P18-2124. URL https://aclanthology.org/P18-2124.
  24. Cross-Modal Dense Passage Retrieval for Outside Knowledge Visual Question Answering. In 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2829–2834. Los Alamitos, CA, USA: IEEE Computer Society. doi:10.1109/ICCVW60793.2023.00304. URL https://doi.ieeecomputersociety.org/10.1109/ICCVW60793.2023.00304.
  25. Outside Knowledge Visual Question Answering Version 2.0. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. doi:10.1109/ICASSP49357.2023.10096074.
  26. Taking a hint: Leveraging explanations to make vision and language models more grounded. In Proceedings of the IEEE International Conference on Computer Vision, 2591–2600.
  27. Interactive reinforcement learning for task-oriented dialogue management. In Conference on Neural Information Processing Systems (NIPS), Workshop on Deep Learning for Action and Interaction.
  28. gTBLS: Generating Tables from Text by Conditional Question Answering.
  29. Multimodal Conversational AI A Survey of Datasets and Approaches. ACL 2022 131.
  30. cTBLS: Augmenting Large Language Models with Conversational Tables. In Chen, Y.-N.; and Rastogi, A., eds., Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023), 59–70. Toronto, Canada: Association for Computational Linguistics. doi:10.18653/v1/2023.nlp4convai-1.6. URL https://aclanthology.org/2023.nlp4convai-1.6.
  31. Spoken Language Understanding: Systems for Extracting Semantic Information from Speech. Wiley.
  32. What is left to be understood in ATIS? In 2010 IEEE Spoken Language Technology Workshop, 19–24. IEEE.
  33. (Almost) zero-shot cross-lingual spoken language understanding. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6034–6038. IEEE.
  34. Grounding Open-Domain Instructions to Automate Web Support Tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1022–1032.
  35. AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.
  36. Zero-shot learning and clustering for semantic utterance classification using deep learning. In International Conference on Learning Representations (cited on page 28).
  37. Generative visual dialogue system via weighted likelihood estimation. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, 1025–1031.
  38. Class-incremental learning via deep model consolidation. In The IEEE Winter Conference on Applications of Computer Vision, 1131–1140.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Larry Heck (41 papers)
  2. Simon Heck (1 paper)
  3. Anirudh Sundar (8 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.