The Value of Information in Retrospect

Published 5 Jun 2018 in stat.ME, math.ST, and stat.TH | (1806.01458v3)

Abstract: In the course of any statistical analysis, it is necessary to consider issues of data quality and model appropriateness. Value of information methods were initially put forward in the middle of the twentieth century in order to provide a framework for choosing between potential sources of information. However, since their genesis, value of information methods have been largely neglected by statisticians. In this paper we review and extend existing value of information methods and recommend the use of three quantities for identifying influential and outlying data: an influence measure previously suggested by \cite{kempthorne1986}, a related quantity known as the expected value of sample information that is used to gauge how much influence we would expect a portion of the data to have, and the ratio of these two quantities which serves as a comparison between observed influence and expected influence. We study the basic theoretical properties of those quantities and illustrate our proposed approach using two datasets. A data set containing employment rates and other economic factors in U.S. first presented by \cite{longley} is used to provide an example in the case of linear regression. HIV surveillance data collected from prenatal clinics have been the main source of information for monitoring the HIV epidemic in low and middle income countries. A data set providing information about HIV prevalence in Swaziland is used as an example in the case of generalized linear mixed models.