Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
91 tokens/sec
Gemini 2.5 Pro Premium
42 tokens/sec
GPT-5 Medium
20 tokens/sec
GPT-5 High Premium
26 tokens/sec
GPT-4o
92 tokens/sec
DeepSeek R1 via Azure Premium
92 tokens/sec
GPT OSS 120B via Groq Premium
485 tokens/sec
Kimi K2 via Groq Premium
197 tokens/sec
2000 character limit reached

Beyond External Monitors: Enhancing Transparency of Large Language Models for Easier Monitoring (2502.05242v2)

Published 7 Feb 2025 in cs.CL, cs.AI, cs.CV, and cs.LG

Abstract: LLMs are becoming increasingly capable, but the mechanisms of their thinking and decision-making process remain unclear. Chain-of-thoughts (CoTs) have been commonly utilized to monitor LLMs, but this strategy fails to accurately reflect LLMs' thinking process. Techniques based on LLMs' hidden representations provide an inner perspective to monitor their latent thinking. However, previous methods only try to develop external monitors instead of making LLMs themselves easier to monitor. In this paper, we propose a novel method TELLME, improving the transparency of LLMs and helping monitors identify unsuitable and sensitive behaviors. Furthermore, we showcase the applications of TELLME on trustworthiness tasks (\eg, safety risks monitoring tasks and detoxification tasks), where LLMs achieve consistent improvement in transparency and task performance. More crucially, we theoretically analyze the improvement of TELLME on LLMs' generalization ability through optimal transport theory.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.