Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
116 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
24 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
35 tokens/sec
2000 character limit reached

Optimal Data Split Methodology for Model Validation (1108.6043v1)

Published 30 Aug 2011 in physics.data-an, math.PR, and stat.ME

Abstract: The decision to incorporate cross-validation into validation processes of mathematical models raises an immediate question - how should one partition the data into calibration and validation sets? We answer this question systematically: we present an algorithm to find the optimal partition of the data subject to certain constraints. While doing this, we address two critical issues: 1) that the model be evaluated with respect to predictions of a given quantity of interest and its ability to reproduce the data, and 2) that the model be highly challenged by the validation set, assuming it is properly informed by the calibration set. This framework also relies on the interaction between the experimentalist and/or modeler, who understand the physical system and the limitations of the model; the decision-maker, who understands and can quantify the cost of model failure; and the computational scientists, who strive to determine if the model satisfies both the modeler's and decision maker's requirements. We also note that our framework is quite general, and may be applied to a wide range of problems. Here, we illustrate it through a specific example involving a data reduction model for an ICCD camera from a shock-tube experiment located at the NASA Ames Research Center (ARC).

Citations (5)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.