Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generative AI for Immersive Communication: The Next Frontier in Internet-of-Senses Through 6G (2404.01713v2)

Published 2 Apr 2024 in cs.CL, cs.AI, cs.HC, cs.MM, and cs.NI

Abstract: Over the past two decades, the Internet-of-Things (IoT) has become a transformative concept, and as we approach 2030, a new paradigm known as the Internet of Senses (IoS) is emerging. Unlike conventional Virtual Reality (VR), IoS seeks to provide multi-sensory experiences, acknowledging that in our physical reality, our perception extends far beyond just sight and sound; it encompasses a range of senses. This article explores the existing technologies driving immersive multi-sensory media, delving into their capabilities and potential applications. This exploration includes a comparative analysis between conventional immersive media streaming and a proposed use case that leverages semantic communication empowered by generative AI. The focal point of this analysis is the substantial reduction in bandwidth consumption by 99.93% in the proposed scheme. Through this comparison, we aim to underscore the practical applications of generative AI for immersive media. Concurrently addressing major challenges in this field, such as temporal synchronization of multiple media, ensuring high throughput, minimizing the End-to-End (E2E) latency, and robustness to low bandwidth while outlining future trajectories.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. C. Lab. (2019) 10 hot consumer trends 2030. [Online]. Available: https://www.ericsson.com/4ae13b/assets/local/reports-papers/consumerlab/reports/2019/10hctreport2030.pdf
  2. M. Melo et al., “Do multisensory stimuli benefit the virtual reality experience? a systematic review,” IEEE Transactions on Visualization and Computer Graphics, vol. 28, no. 2, pp. 1428–1442, 2022.
  3. K. R. Pyun et al., “Materials and devices for immersive virtual reality,” Nature Reviews Materials, vol. 7, no. 11, pp. 841–843, 2022.
  4. G. Fettweis et al., “The tactile internet-itu-t technology watch report,” Int. Telecom. Union (ITU), Geneva, 2014.
  5. I. F. Akyildiz et al., “Mulsemedia communication research challenges for metaverse in 6G wireless systems,” arXiv preprint arXiv:2306.16359, 2023.
  6. G. Delétang et al., “Language modeling is compression,” arXiv preprint arXiv:2309.10668, 2023.
  7. T. Brown et al., “Language models are few-shot learners,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33.   Curran Associates, Inc., 2020, pp. 1877–1901. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
  8. D. Zhang, Y. Yu, C. Li, J. Dong, D. Su, C. Chu, and D. Yu, “Mm-llms: Recent advances in multimodal large language models,” arXiv e-prints, pp. arXiv–2401, 2024.
  9. N. Ranasinghe et al., “Tongue mounted interface for digitally actuating the sense of taste,” in 2012 16th international symposium on wearable computers.   IEEE, 2012, pp. 80–87.
  10. D. Maynes-Aminzade, “Edible bits: Seamless interfaces between people, data and food,” in Conference on Human Factors in Computing Systems (CHI’05)-Extended Abstracts.   Citeseer, 2005, pp. 2207–2210.
  11. D. Panagiotakopoulos et al., “Digital scent technology: Toward the internet of senses and the metaverse,” IT Professional, vol. 24, no. 3, pp. 52–59, 2022.
  12. Y. E. Choi, “A survey of 3d audio reproduction techniques for interactive virtual reality applications,” IEEE Access, vol. 7, pp. 26 298–26 316, 2019.
  13. M. Lewis et al., “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” arXiv preprint arXiv:1910.13461, 2019.
  14. J. Devlin et al., “Bert: Pre-training of deep bidirectional transformers for language understanding,” 2018. [Online]. Available: https://arxiv.org/abs/1810.04805
  15. T. Gale et al., “MegaBlocks: Efficient Sparse Training with Mixture-of-Experts,” Proceedings of Machine Learning and Systems, vol. 5, 2023.
  16. J. Xing et al., “A survey of efficient fine-tuning methods for vision-language models—prompt and adapter,” Computers & Graphics, 2024.
  17. Z. Yang et al., “The dawn of lmms: Preliminary explorations with gpt-4v (ision),” arXiv preprint arXiv:2309.17421, vol. 9, no. 1, 2023.
  18. Z. Yuan et al., “Tinygpt-v: Efficient multimodal large language model via small backbones,” 2023.
  19. W. Dai et al., “Instructblip: Towards general-purpose vision-language models with instruction tuning,” 2023.
  20. L. Wei et al., “Instructiongpt-4: A 200-instruction paradigm for fine-tuning minigpt-4,” arXiv preprint arXiv:2308.12067, 2023.
  21. C.-Y. Wang et al., “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  22. S. Degadwala et al., “Image captioning using inception v3 transfer learning model,” in 2021 6th International Conference on Communication and Electronics Systems (ICCES).   IEEE, 2021, pp. 1103–1108.
  23. R. Baruah and R. Baruah, “Building vr for the web with a-frame,” AR and VR Using the WebXR API: Learn to Create Immersive Content with WebGL, Three. js, and A-Frame, pp. 253–287, 2021.
  24. N. Rotstein, D. Bensaïd, S. Brody, R. Ganz, and R. Kimmel, “Fusecap: Leveraging large language models for enriched fused image captions,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 5689–5700.
  25. T. Taleb et al., “Vr-based immersive service management in b5g mobile systems: A uav command and control use case,” IEEE Internet of Things Journal, vol. 10, no. 6, pp. 5349–5363, 2022.
  26. J. Newmarch and J. Newmarch, “Ffmpeg/libav,” Linux sound programming, pp. 227–234, 2017.
  27. P. Ganesh, Y. Chen, Y. Yang, D. Chen, and M. Winslett, “Yolo-ret: Towards high accuracy real-time object detection on edge gpus,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 3267–3277.
  28. T. Matahari, “Webxr asset management in developing virtual reality learning media,” Indonesian Journal of Computing, Engineering and Design (IJoCED), vol. 4, no. 1, pp. 38–46, 2022.
  29. A. Maatouk et al., “Teleqna: A benchmark dataset to assess large language models telecommunications knowledge,” arXiv preprint arXiv:2310.15051, 2023.
  30. R. Zhong et al., “Mobile edge generation: A new era to 6g,” arXiv preprint arXiv:2401.08662, 2023.
Citations (11)

Summary

We haven't generated a summary for this paper yet.