Regression diagnostics meets forecast evaluation: conditional calibration, reliability diagrams, and coefficient of determination
A common principle in model diagnostics and forecast evaluation is that fitted or predicted distributions ought to be reliable, ideally in the sense of auto-calibration, where the outcome is a random draw from the posited distribution. For binary responses, auto-calibration is the universal concept...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article (Journal) |
| Language: | English |
| Published: |
2023
|
| In: |
Electronic journal of statistics
Year: 2023, Volume: 17, Issue: 2, Pages: 3226-3286 |
| ISSN: | 1935-7524 |
| DOI: | 10.1214/23-EJS2180 |
| Online Access: | Verlag, kostenfrei, Volltext: https://doi.org/10.1214/23-EJS2180 Verlag, kostenfrei, Volltext: https://projecteuclid.org/journals/electronic-journal-of-statistics/volume-17/issue-2/Regression-diagnostics-meets-forecast-evaluation--conditional-calibration-reliability-diagrams/10.1214/23-EJS2180.full |
| Author Notes: | Tilmann Gneiting and Johannes Resin |
| Summary: | A common principle in model diagnostics and forecast evaluation is that fitted or predicted distributions ought to be reliable, ideally in the sense of auto-calibration, where the outcome is a random draw from the posited distribution. For binary responses, auto-calibration is the universal concept of reliability. For real-valued outcomes, a general theory of calibration has been elusive, despite a recent surge of interest in distributional regression and machine learning. We develop a framework rooted in probability theory, which gives rise to hierarchies of calibration, and applies to both predictive distributions and stand-alone point forecasts. In a nutshell, a prediction is conditionally T-calibrated if it can be taken at face value in terms of an identifiable functional T. We introduce population versions of T-reliability diagrams and revisit a score decomposition into measures of miscalibration, discrimination, and uncertainty. In empirical settings, stable and efficient estimators of T-reliability diagrams and score components arise via nonparametric isotonic regression and the pool-adjacent-violators algorithm. For in-sample model diagnostics, we propose a universal coefficient of determination that nests and reinterprets the classical R2 in least squares regression and its natural analog R1 in quantile regression, yet applies to T-regression in general. |
|---|---|
| Item Description: | Gesehen am 10.12.2024 |
| Physical Description: | Online Resource |
| ISSN: | 1935-7524 |
| DOI: | 10.1214/23-EJS2180 |