Discrimination and calibration of models predicting risk
Risk prediction is common in medicine, eg the Framingham Risk Score and the Pooled Cohort Equation/10-year ASCVD risk prediction models. Machine learning models that also predict risk are growing in popularity.
Discrimination and calibration are discussed in this excellent JAMA article. In brief, discrimination relates to how a model divvies up low and high risk individuals. So, in a population of folks of various risk levels, high discriminating models will score higher risk folks higher than low risk folks. For example, a CVD model should say that a 60 year old with hypertension, hyperlipidemia, and diabetes has a higher risk of CVD than a 20 year old with none of those conditions. Calibration, on the other hand, describes how well a model predicts risk for an individual person.
Discrimination descriptive labels
I have to look this up every time, so I am putting this here. Here’s a widely-used labeling for discrimination, which in this manuscript, we called it the “Hosmer-Lemeshow criteria”. These authors applied this to area under the ROC (AUROC) curves, but I think it’s also fair to apply to C-statistics.
- 0.5: No discrimination
- 0.5 to <0.7: Poor discrimination
- 0.7 to <0.8: Acceptable discrimination
- 0.8 to <0.9: Excellent discrimination
- ≥0.9: Outstanding discrimination
The reference is here:
Hosmer JD, Lemeshow S, Sturdivant R. Applied Logistic Regression. Hoboken, NJ, USA: John Wiley & Sons Inc; 2013.
Here’s the Google Books listing for this, in case you want to grab the metadata for this reference using Zotero or whatnot. You’ll see the above labels on page 177 — you can see this with the “Preview” view.