Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 79 tok/s
Gemini 2.5 Pro 55 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 85 tok/s Pro
GPT OSS 120B 431 tok/s Pro
Kimi K2 186 tok/s Pro
2000 character limit reached

Evaluation of human-model prediction difference on the Internet Scale of Data (2312.03291v2)

Published 6 Dec 2023 in cs.LG and cs.AI

Abstract: Evaluating models on datasets often fails to capture their behavior when faced with unexpected and diverse types of inputs. It would be beneficial if we could evaluate the difference between human annotation and model prediction for an internet number of inputs, or more generally, for an input space that enumeration is computationally impractical. Traditional model evaluation methods rely on precision and recall (PR) as metrics, which are typically estimated by comparing human annotations with model predictions on a specific dataset. This is feasible because enumerating thousands of test inputs is manageable. However, estimating PR across a large input space is challenging because enumeration becomes computationally infeasible. We propose OmniInput, a novel approach to evaluate and compare NNs by the PR of an input space. OmniInput is distinctive from previous works as its estimated PR reflects the estimation of the differences between human annotation and model prediction in the input space which is usually too huge to be enumerated. We empirically validate our method within an enumerable input space, and our experiments demonstrate that OmniInput can effectively estimate and compare precision and recall for (large) LLMs within a broad input space that is not enumerable.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.