Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM (2407.10870v1)

Published 15 Jul 2024 in cs.CV, cs.AI, cs.HC, and cs.LG

Abstract: Large vision-LLMs (LVLMs), such as the Generative Pre-trained Transformer 4-omni (GPT-4o), are emerging multi-modal foundation models which have great potential as powerful artificial-intelligence (AI) assistance tools for a myriad of applications, including healthcare, industrial, and academic sectors. Although such foundation models perform well in a wide range of general tasks, their capability without fine-tuning is often limited in specialized tasks. However, full fine-tuning of large foundation models is challenging due to enormous computation/memory/dataset requirements. We show that GPT-4o can decode hand gestures from forearm ultrasound data even with no fine-tuning, and improves with few-shot, in-context learning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Keshav Bimbraw (9 papers)
  2. Ye Wang (248 papers)
  3. Jing Liu (525 papers)
  4. Toshiaki Koike-Akino (71 papers)