Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 71 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 18 tok/s Pro

GPT-5 High 15 tok/s Pro

GPT-4o 101 tok/s Pro

Kimi K2 196 tok/s Pro

GPT OSS 120B 467 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

We Can't Understand AI Using our Existing Vocabulary (2502.07586v1)

Published 11 Feb 2025 in cs.CL and cs.AI

Abstract: This position paper argues that, in order to understand AI, we cannot rely on our existing vocabulary of human words. Instead, we should strive to develop neologisms: new words that represent precise human concepts that we want to teach machines, or machine concepts that we need to learn. We start from the premise that humans and machines have differing concepts. This means interpretability can be framed as a communication problem: humans must be able to reference and control machine concepts, and communicate human concepts to machines. Creating a shared human-machine language through developing neologisms, we believe, could solve this communication problem. Successful neologisms achieve a useful amount of abstraction: not too detailed, so they're reusable in many contexts, and not too high-level, so they convey precise information. As a proof of concept, we demonstrate how a "length neologism" enables controlling LLM response length, while a "diversity neologism" allows sampling more variable responses. Taken together, we argue that we cannot understand AI using our existing vocabulary, and expanding it through neologisms creates opportunities for both controlling and understanding machines better.

Collections

Summary

The paper presents its main contribution by proposing neologisms to bridge the conceptual gap between human vocabulary and AI models.
It demonstrates, through experiments with length and diversity neologisms, that targeted language cues can effectively modulate AI responses.
The findings imply that developing a shared lexicon could advance AI interpretability and foster more intuitive human-machine interaction.

Understanding AI with Neologisms: A New Frontier in Human-Machine Communication

The paper "We Can't Understand AI Using our Existing Vocabulary" by Hewitt, Geirhos, and Kim from Google DeepMind presents an intriguing proposition in the field of AI interpretability and control. It argues for the development of neologisms—new words to bridge the conceptual gap between humans and AI models. This paper suggests that our current lexicon is insufficient for effectively communicating human concepts to machines and vice versa.

Summary of Main Arguments

The authors build their thesis on the premise that humans and machines, operating on different conceptual frameworks, require a shared language for effective communication. This communication problem necessitates the creation of neologisms to bridge these gaps. The paper emphasizes that successful neologisms strike the right balance between abstraction and detail, allowing them to be reusable across contexts without losing specificity.

To substantiate their argument, the authors provide proof of concept through the introduction of two neologisms: a "length neologism" and a "diversity neologism." These were designed to enable control over LLM response length and variability, showing that these neologisms can guide the behavior of LLMs in a manner aligned with human intentions.

Implications and Speculations on Future Developments

The proposal to use neologisms to facilitate AI interpretability and control carries significant implications. On a practical level, it could lead to more user-friendly interfaces between AI systems and end-users, making AI systems more accessible and easier to control without extensive technical intervention. Who develops the neologisms and how they are integrated into systems could become a critical aspect of AI development and usage.

From a theoretical perspective, the notion of neologisms challenges researchers to rethink the framework of AI interpretability. It shifts the focus from retrofitting human concepts onto AI systems to proactively shaping a shared language. This perspective aligns with more cooperative interactions between human and machine agencies. The creation of a hybrid lexicon could even facilitate general advancements in fields like natural language processing and machine learning interpretability tools, as it demands insights into how both entities conceptualize their environments.

Looking to the future, this proposal could push the development of AI towards systems that not only respond to but also contribute to the evolution of human languages. Such a dynamic would require models to not just translate but actively engage in conceptual exchange, expanding their utility in collaborative environments further. It also opens questions around the governance and ethical considerations of who controls the formation and usage of these neologisms.

Numerical and Conceptual Highlights

The paper cautiously presents its numerical results, particularly in the context of the length and diversity neologisms. While it refrains from grand claims, the authors provide convincing evidence through controlled experiments illustrating improved model behavior. For instance, the implementation of the length neologism consistently resulted in significantly longer responses where desired, indicating an effective modulation of model output predicated on human instruction.

In terms of bold conceptual claims, the authors assert that this approach could effectively counteract inherent biases in AI development, particularly confirmation bias and anthropomorphism. The framing insists on the necessity of acknowledging and operationalizing the differences between human and machine understanding—a crucial step ignored or glossed over in many traditional interpretability efforts.

Conclusion

The notion of developing neologisms for AI interpretability and control represents a novel pathway in addressing the conceptual disparities between human users and AI systems. By seeking to create a joint human-machine lexicon, the research opens up new avenues for improving the functionality and applicability of AI in complex, real-world contexts. While still in exploratory phases, the impact of such a framework could redefine how we approach AI development, bridging human creativity and precision with machine learning capabilities. As AI systems continue to evolve, embracing such innovative approaches will be crucial to harnessing their full potential while ensuring ethical and user-focused integration.