Regression diagnostics meets forecast evaluation: conditional calibration, reliability diagrams, and coefficient of determination

A common principle in model diagnostics and forecast evaluation is that fitted or predicted distributions ought to be reliable, ideally in the sense of auto-calibration, where the outcome is a random draw from the posited distribution. For binary responses, auto-calibration is the universal concept...

Full description

Saved in:
Bibliographic Details
Main Authors: Gneiting, Tilmann (Author) , Resin, Johannes (Author)
Format: Article (Journal)
Language:English
Published: 2023
In: Electronic journal of statistics
Year: 2023, Volume: 17, Issue: 2, Pages: 3226-3286
ISSN:1935-7524
DOI:10.1214/23-EJS2180
Online Access:Verlag, kostenfrei, Volltext: https://doi.org/10.1214/23-EJS2180
Verlag, kostenfrei, Volltext: https://projecteuclid.org/journals/electronic-journal-of-statistics/volume-17/issue-2/Regression-diagnostics-meets-forecast-evaluation--conditional-calibration-reliability-diagrams/10.1214/23-EJS2180.full
Get full text
Author Notes:Tilmann Gneiting and Johannes Resin
Description
Summary:A common principle in model diagnostics and forecast evaluation is that fitted or predicted distributions ought to be reliable, ideally in the sense of auto-calibration, where the outcome is a random draw from the posited distribution. For binary responses, auto-calibration is the universal concept of reliability. For real-valued outcomes, a general theory of calibration has been elusive, despite a recent surge of interest in distributional regression and machine learning. We develop a framework rooted in probability theory, which gives rise to hierarchies of calibration, and applies to both predictive distributions and stand-alone point forecasts. In a nutshell, a prediction is conditionally T-calibrated if it can be taken at face value in terms of an identifiable functional T. We introduce population versions of T-reliability diagrams and revisit a score decomposition into measures of miscalibration, discrimination, and uncertainty. In empirical settings, stable and efficient estimators of T-reliability diagrams and score components arise via nonparametric isotonic regression and the pool-adjacent-violators algorithm. For in-sample model diagnostics, we propose a universal coefficient of determination that nests and reinterprets the classical R2 in least squares regression and its natural analog R1 in quantile regression, yet applies to T-regression in general.
Item Description:Gesehen am 10.12.2024
Physical Description:Online Resource
ISSN:1935-7524
DOI:10.1214/23-EJS2180