Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evaluating Large Language Models as Virtual Annotators for Time-series Physical Sensing Data (2403.01133v2)

Published 2 Mar 2024 in cs.LG and eess.SP

Abstract: Traditional human-in-the-loop-based annotation for time-series data like inertial data often requires access to alternate modalities like video or audio from the environment. These alternate sources provide the necessary information to the human annotator, as the raw numeric data is often too obfuscated even for an expert. However, this traditional approach has many concerns surrounding overall cost, efficiency, storage of additional modalities, time, scalability, and privacy. Interestingly, recent LLMs are also trained with vast amounts of publicly available alphanumeric data, which allows them to comprehend and perform well on tasks beyond natural language processing. Naturally, this opens up a potential avenue to explore LLMs as virtual annotators where the LLMs will be directly provided the raw sensor data for annotation instead of relying on any alternate modality. Naturally, this could mitigate the problems of the traditional human-in-the-loop approach. Motivated by this observation, we perform a detailed study in this paper to assess whether the state-of-the-art (SOTA) LLMs can be used as virtual annotators for labeling time-series physical sensing data. To perform this in a principled manner, we segregate the study into two major phases. In the first phase, we investigate the challenges an LLM like GPT-4 faces in comprehending raw sensor data. Considering the observations from phase 1, in the next phase, we investigate the possibility of encoding the raw sensor data using SOTA SSL approaches and utilizing the projected time-series data to get annotations from the LLM. Detailed evaluation with four benchmark HAR datasets shows that SSL-based encoding and metric-based guidance allow the LLM to make more reasonable decisions and provide accurate annotations without requiring computationally expensive fine-tuning or sophisticated prompt engineering.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Multimodal llms for health grounded in individual-specific data. In ACM ML4MHD. Springer, 86–102.
  2. Prentow Thor Kjrgaard Mikkel Blunck Henrik, Bhattacharya Sourav and Dey Anind. 2015. Heterogeneity Activity Recognition. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5689X.
  3. Language models are few-shot learners. NeurIPS 33 (2020), 1877–1901.
  4. Unsupervised learning of visual features by contrasting cluster assignments. NeurIPS 33 (2020), 9912–9924.
  5. Acconotate: Exploiting Acoustic Changes for Automatic Annotation of Inertial Data at the Source. In IEEE DCOSS-IoT. IEEE, 25–33.
  6. A simple framework for contrastive learning of visual representations. In ICML. PMLR, 1597–1607.
  7. Time series change point detection with self-supervised contrastive predictive coding. In ACM WWW. 3124–3135.
  8. Cocoa: Cross modality contrastive learning for sensor data. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1–28.
  9. How Much Unlabeled Data is Really Needed for Effective Self-Supervised Human Activity Recognition?. In ACM ISWC. 66–70.
  10. LLM-based NLG Evaluation: Current Status and Challenges. arXiv preprint arXiv:2402.01383 (2024).
  11. Jury learning: Integrating dissenting voices into machine learning models. In ACM CHI. 1–19.
  12. Actively avoiding nonsense in generative models. In ACM COLT. PMLR, 209–227.
  13. Assessing the state of self-supervised human activity recognition using wearables. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1–47.
  14. Momentum contrast for unsupervised visual representation learning. In IEEE CVPR. 9729–9738.
  15. HM Sajjad Hossain and Nirmalya Roy. 2019. Active deep learning for activity recognition with context aware annotator selection. In ACM SIGKDD. 1862–1870.
  16. Collossl: Collaborative self-supervised learning for human activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 1 (2022), 1–28.
  17. A survey on contrastive self-supervised learning. Technologies 9, 1 (2020), 2.
  18. Time-llm: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728 (2023).
  19. Position Paper: What Can Large Language Models Tell Us about Time Series Analysis. arXiv preprint arXiv:2402.02713 (2024).
  20. Health-llm: Large language models for health prediction via wearable sensor data. arXiv preprint arXiv:2401.06866 (2024).
  21. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  22. Handling annotation uncertainty in human activity recognition. In ACM ISWC. 109–117.
  23. Gierad Laput and Chris Harrison. 2019. Sensing fine-grained hand activity with smartwatches. In ACM CHI. 1–13.
  24. Holistic evaluation of language models. arXiv preprint arXiv:2211.09110 (2022).
  25. Large Language Models are Few-Shot Health Learners. arXiv preprint arXiv:2305.15525 (2023).
  26. Mobile sensor data anonymization. In IoTDI. 49–58.
  27. Microsoft. 2023. What is a Vector Database? https://learn.microsoft.com/en-us/semantic-kernel/memories/vector-db, Last Accessed: March 2, 2024.
  28. OpenAI. 2023. Rate limits. https://platform.openai.com/docs/guides/rate-limits?context=tier-free, Last Accessed: March 2, 2024.
  29. OpenAI. 2024. How much does GPT-4 cost? https://help.openai.com/en/articles/7127956-how-much-does-gpt-4-cost, Last Accessed: March 2, 2024.
  30. What makes good contrastive learning on small-scale wearable-based tasks?. In ACM SIGKDD. 3761–3771.
  31. Layoutllm-t2i: Eliciting layout guidance from llm for text-to-image generation. In ACM MM. 643–654.
  32. Cogax: Early assessment of cognitive and functional impairment from accelerometry. In PerCom. IEEE, 66–76.
  33. Attila Reiss. 2012. PAMAP2 Physical Activity Monitoring. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5NW2H.
  34. Ghio Alessandro Oneto Luca Parra Xavier Reyes-Ortiz Jorge, Anguita Davide. 2012. Human Activity Recognition Using Smartphones. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C54S4K.
  35. Breaking away from labels: The promise of self-supervised machine learning in intelligent health. Patterns 3, 2 (2022).
  36. Exploring contrastive learning in human activity recognition for healthcare. arXiv preprint arXiv:2011.11542 (2020).
  37. Towards generalist biomedical ai. NEJM AI 1, 3 (2024), AIoa2300138.
  38. A survey of human-in-the-loop for machine learning. Future Generation Computer Systems 135 (2022), 364–381.
  39. Sang Michael Xie and Sewon Min. 2022. How does in-context learning work? A framework for understanding the differences from traditional supervised learning. https://ai.stanford.edu/blog/understanding-incontext/, Last Accessed: March 2, 2024.
  40. Penetrative ai: Making llms comprehend the physical world. arXiv preprint arXiv:2310.09605 (2023).
  41. Retrieval-Based Reconstruction For Time-series Contrastive Learning. In ICLR.
  42. ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders. arXiv preprint arXiv:2308.01317 (2023).
  43. Hao Xue and Flora D Salim. 2023. Promptcast: A new prompt-based learning paradigm for time series forecasting. IEEE Transactions on Knowledge and Data Engineering (2023).
  44. Leveraging language foundation models for human mobility forecasting. In ACM SIGSPATIAL. 1–9.
  45. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629 (2022).
  46. Self-supervised learning for human activity recognition using 700,000 person-days of wearable data. arXiv preprint arXiv:2206.02909 (2022).
  47. Fixing mislabeling by human annotators leveraging conflict resolution and prior knowledge. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 1 (2019), 1–23.
  48. Video-llama: An instruction-tuned audio-visual language model for video understanding. arXiv preprint arXiv:2306.02858 (2023).
  49. Large Language Models for Time Series: A Survey. arXiv:2402.01801 [cs.LG]
  50. Self-supervised contrastive pre-training for time series via time-frequency consistency. NeurIPS 35 (2022), 3988–4003.
  51. SyncWISE: Window induced shift estimation for synchronization of video and accelerometry from wearable sensors. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 3 (2020), 1–26.
  52. Balancing specialized and general skills in llms: The impact of modern tuning and data strategy. arXiv preprint arXiv:2310.04945 (2023).
  53. Explainability for large language models: A survey. ACM TIST (2023).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Aritra Hota (1 paper)
  2. Soumyajit Chatterjee (12 papers)
  3. Sandip Chakraborty (35 papers)
Citations (7)