Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models (2401.17139v2)

Published 30 Jan 2024 in cs.LG, cs.AI, cs.CL, cs.IT, and math.IT

Abstract: LLMs have transformed natural language processing and extended their powerful capabilities to multi-modal domains. As LLMs continue to advance, it is crucial to develop diverse and appropriate metrics for their evaluation. In this paper, we introduce a novel rank-based metric, Diff-eRank, grounded in information theory and geometry principles. Diff-eRank assesses LLMs by analyzing their hidden representations, providing a quantitative measure of how efficiently they eliminate redundant information during training. We demonstrate the applicability of Diff-eRank in both single-modal (e.g., language) and multi-modal settings. For LLMs, our results show that Diff-eRank increases with model size and correlates well with conventional metrics such as loss and accuracy. In the multi-modal context, we propose an alignment evaluation method based on the eRank, and verify that contemporary multi-modal LLMs exhibit strong alignment performance based on our method. Our code is publicly available at https://github.com/waltonfuture/Diff-eRank.

Citations (6)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - waltonfuture/Matrix-Entropy (48 stars)

Tweets

https://twitter.com/waltonfuture/status/1752957703654195646

https://twitter.com/waltonfuture/status/1754084405994283156

https://twitter.com/waltonfuture/status/1752957696515420259

https://twitter.com/waltonfuture/status/1853524254450405820

YouTube

Show All Videos

Diff-eRank: A Novel Rank-Based Metric for Evaluating Large Language Models (2401.17139v2)

Summary

Related Papers

GitHub

Tweets

YouTube