On the Evaluation of Probability Forecasts: An Application to Qualitative Choice Models

Using data from Nielsen HomeScan scanner panel for calendar year 2003, we develop binary choice models to focus on the decision made by a sample of U.S. households to purchase various non-alcoholic beverages. We evaluate the probabilities generated through those qualitative choice models using an array of techniques such as expectation-prediction success tables; receiver operating characteristics (ROC) curve, Kullback-Leibler Information criteria; calibration; resolution (sorting); the Brier score; and the Yates partition of the Brier score. In using expectation-prediction success tables, we paid attention to sensitivity and specificity. Use of a naïve 0.50 cut-off to classify probabilities resulted in the over or under estimation of sensitivity and specificity values compared to the use of the market penetration value. Area under the ROC curve is suggested as an alternative to the use of 0.5 cut-off as well as cut-off at market penetration level to classify probabilities, because this method treats a wide range of cut-off probabilities to come up with a coherent measure in classifying probabilities. The area under the ROC was highest for coffee for with-in-sample probabilities while it was highest for fruit juice model for out-of-sample probabilities. Kullback-Leibler Information Criteria which selects the model with the highest log-likelihood function value observed at out-of-sample observations (OSLLF) to evaluate probabilities show “closeness” or deviation of model generated probabilities to the true data generating probability overall, although this method does not offer classification of probabilities for events that occurred versus that did not. Again, with respect to OSLLF value, probabilities associated with fruit juice model outperform all other beverages. Forecast probabilities with respect to most of the beverage purchases were well calibrated. All resolution graphs were almost flat against a 45-degree perfect resolution graph, indicative of poor sorting power of choice models. The Brier score was lowest for fruit juices and the highest for low-fat milk. According to the calculated Brier score, probability forecasts for fruit juices outperformed other non-alcoholic beverages. Although the Brier score gave an overall indication of the ability of a model to forecast accurately, the components of the Yates decomposition of the Brier score provided a clearer and broader indication of the ability of the model to forecast. With-in-sample probabilities generated through logit model for coffee outperforms probabilities generated for other beverages based on area under the ROC curve, covariance between probabilities and outcome index and slope of covariance. Out-of-sample probabilities generated through logit model for fruit juice performs better than any other beverage category based on area under the ROC curve, Brier Score, and OSLLF value. In the event where researchers are confronted with alternative models that issue probability forecasts, the accuracy of probability forecasts in determining the best model can be measured through myriad of metrics. Even though traditional measures such as expectation-prediction success tables, calibration and log-likelihood approaches are still used, ROC charts, resolution, the Brier score and the Yates partition of the Brier score to evaluate probabilities generated through alternative models are highly recommended.

Issue Date:
Publication Type:
Conference Paper/ Presentation
PURL Identifier:
Total Pages:
JEL Codes:
C25; C52; D12

 Record created 2017-04-01, last modified 2017-08-22

Download fulltext

Rate this document:

Rate this document:
(Not yet reviewed)